-
Embedding-based search in JetBrains IDEs
Authors:
Evgeny Abramov,
Nikolai Palchikov
Abstract:
Most modern Integrated Development Environments (IDEs) and code editors have a feature to search across available functionality and items in an open project. In JetBrains IDEs, this feature is called Search Everywhere: it allows users to search for files, actions, classes, symbols, settings, and anything from VCS history from a single entry point. However, it works with the candidates obtained by…
▽ More
Most modern Integrated Development Environments (IDEs) and code editors have a feature to search across available functionality and items in an open project. In JetBrains IDEs, this feature is called Search Everywhere: it allows users to search for files, actions, classes, symbols, settings, and anything from VCS history from a single entry point. However, it works with the candidates obtained by algorithms that don't account for semantics, e.g., synonyms, complex word permutations, part of the speech modifications, and typos. In this work, we describe the machine learning approach we implemented to improve the discoverability of search items. We also share the obstacles encountered during this process and how we overcame them.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Quadric Hypersurface Intersection for Manifold Learning in Feature Space
Authors:
Fedor Pavutnitskiy,
Sergei O. Ivanov,
Evgeny Abramov,
Viacheslav Borovitskiy,
Artem Klochkov,
Viktor Vialov,
Anatolii Zaikovskii,
Aleksandr Petiushko
Abstract:
The knowledge that data lies close to a particular submanifold of the ambient Euclidean space may be useful in a number of ways. For instance, one may want to automatically mark any point far away from the submanifold as an outlier or to use the geometry to come up with a better distance metric. Manifold learning problems are often posed in a very high dimension, e.g. for spaces of images or space…
▽ More
The knowledge that data lies close to a particular submanifold of the ambient Euclidean space may be useful in a number of ways. For instance, one may want to automatically mark any point far away from the submanifold as an outlier or to use the geometry to come up with a better distance metric. Manifold learning problems are often posed in a very high dimension, e.g. for spaces of images or spaces of words. Today, with deep representation learning on the rise in areas such as computer vision and natural language processing, many problems of this kind may be transformed into problems of moderately high dimension, typically of the order of hundreds. Motivated by this, we propose a manifold learning technique suitable for moderately high dimension and large datasets. The manifold is learned from the training data in the form of an intersection of quadric hypersurfaces -- simple but expressive objects. At test time, this manifold can be used to introduce a computationally efficient outlier score for arbitrary new data points and to improve a given similarity metric by incorporating the learned geometric structure into it.
△ Less
Submitted 24 February, 2022; v1 submitted 11 February, 2021;
originally announced February 2021.
-
Generalized version of the support vector machine for binary classification problems: supporting hyperplane machine
Authors:
E. G. Abramov,
A. B. Komissarov,
D. A. Kornyakov
Abstract:
In this paper there is proposed a generalized version of the SVM for binary classification problems in the case of using an arbitrary transformation x -> y. An approach similar to the classic SVM method is used. The problem is widely explained. Various formulations of primal and dual problems are proposed. For one of the most important cases the formulae are derived in detail. A simple computation…
▽ More
In this paper there is proposed a generalized version of the SVM for binary classification problems in the case of using an arbitrary transformation x -> y. An approach similar to the classic SVM method is used. The problem is widely explained. Various formulations of primal and dual problems are proposed. For one of the most important cases the formulae are derived in detail. A simple computational example is demonstrated. The algorithm and its implementation is presented in Octave language.
△ Less
Submitted 15 April, 2014; v1 submitted 13 April, 2014;
originally announced April 2014.