Skip to main content

Showing 1–8 of 8 results for author: Avdiukhin, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.00379  [pdf, other

    cs.LG stat.ML

    Optimal Sample Complexity of Contrastive Learning

    Authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory Yaroslavtsev

    Abstract: Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on a… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  2. arXiv:2302.04492  [pdf, other

    cs.LG

    Tree Learning: Optimal Algorithms and Sample Complexity

    Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev, Danny Vainstein, Orr Fischer, Sauman Das, Faraz Mirza

    Abstract: We study the problem of learning a hierarchical tree representation of data from labeled samples, taken from an arbitrary (and possibly adversarial) distribution. Consider a collection of data tuples labeled according to their hierarchical structure. The smallest number of such tuples required in order to be able to accurately label subsequent tuples is of interest for data collection in machine l… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  3. arXiv:2205.13753  [pdf, other

    cs.LG math.OC stat.ML

    HOUDINI: Esca** from Moderately Constrained Saddles

    Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev

    Abstract: We give the first polynomial time algorithms for esca** from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function $f \colon \mathbb R^d \to \mathbb R$ we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. This constitutes the first tangible progress (without re… ▽ More

    Submitted 20 April, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  4. arXiv:2105.10090  [pdf, other

    cs.LG math.OC stat.ML

    Esca** Saddle Points with Compressed SGD

    Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev

    Abstract: Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  5. arXiv:2012.08466  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    Objective-Based Hierarchical Clustering of Deep Embedding Vectors

    Authors: Stanislav Naumov, Grigory Yaroslavtsev, Dmitrii Avdiukhin

    Abstract: We initiate a comprehensive experimental study of objective-based hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e.g. Res… ▽ More

    Submitted 8 June, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (2021), 35(10), 9055-9063

  6. arXiv:1910.05646  [pdf, other

    cs.DS cs.LG

    "Bring Your Own Greedy"+Max: Near-Optimal $1/2$-Approximations for Submodular Knapsack

    Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev, Samson Zhou

    Abstract: The problem of selecting a small-size representative summary of a large dataset is a cornerstone of machine learning, optimization and data science. Motivated by applications to recommendation systems and other scenarios with query-limited access to vast amounts of data, we propose a new rigorous algorithmic framework for a standard formulation of this problem as a submodular maximization subject… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

  7. arXiv:1905.02367  [pdf, other

    cs.DS cs.LG

    Adversarially Robust Submodular Maximization under Knapsack Constraints

    Authors: Dmitrii Avdiukhin, Slobodan Mitrović, Grigory Yaroslavtsev, Samson Zhou

    Abstract: We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: To appear in KDD 2019

  8. arXiv:1902.03522  [pdf, other

    cs.DS cs.DB cs.DC

    Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

    Authors: Dmitrii Avdiukhin, Sergey Pupyrev, Grigory Yaroslavtsev

    Abstract: Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the multi-dimensional variant when balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is im… ▽ More

    Submitted 15 February, 2019; v1 submitted 9 February, 2019; originally announced February 2019.