Skip to main content

Showing 1–9 of 9 results for author: Ivkin, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2012.08483  [pdf, other

    cs.LG

    Amazon SageMaker Autopilot: a white box AutoML solution at scale

    Authors: Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin, Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav Miladinovic, Giovanni Zappella, Cédric Archambeau, Matthias Seeger, Bhaskar Dutt, Laurence Rouesnel

    Abstract: AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by each particular system would provide sub-par perfo… ▽ More

    Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

  2. arXiv:2011.06103  [pdf, other

    cs.DC astro-ph.SR cs.LG

    Sketch and Scale: Geo-distributed tSNE and UMAP

    Authors: Viska Wei, Nikita Ivkin, Vladimir Braverman, Alexander Szalay

    Abstract: Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools s… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: IEEE BigData2020 conference

  3. arXiv:2007.13382  [pdf, other

    stat.ML cs.LG

    Practical and sample efficient zero-shot HPO

    Authors: Fela Winkelmolen, Nikita Ivkin, H. Furkan Bozkurt, Zohar Karnin

    Abstract: Zero-shot hyperparameter optimization (HPO) is a simple yet effective use of transfer learning for constructing a small list of hyperparameter (HP) configurations that complement each other. That is to say, for any given dataset, at least one of them is expected to perform well. Current techniques for obtaining this list are computationally expensive as they rely on running training jobs on a dive… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  4. arXiv:2007.07682  [pdf, other

    cs.LG stat.ML

    FetchSGD: Communication-Efficient Federated Learning with Sketching

    Authors: Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora

    Abstract: Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A k… ▽ More

    Submitted 7 October, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  5. arXiv:1907.00236  [pdf, other

    cs.DS cs.DB cs.LG

    Streaming Quantiles Algorithms with Small Space and Update Time

    Authors: Nikita Ivkin, Edo Liberty, Kevin Lang, Zohar Karnin, Vladimir Braverman

    Abstract: Approximating quantiles and distributions over streaming data has been studied for roughly two decades now. Recently, Karnin, Lang, and Liberty proposed the first asymptotically optimal algorithm for doing so. This manuscript complements their theoretical result by providing a practical variants of their algorithm with improved constants. For a given sketch size, our techniques provably reduce the… ▽ More

    Submitted 29 June, 2019; originally announced July 2019.

  6. arXiv:1903.04488  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Communication-efficient distributed SGD with Sketching

    Authors: Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora

    Abstract: Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that Sketched SGD has f… ▽ More

    Submitted 23 January, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: 19 pages, 6 figures, published at NeurIPS 2019

  7. arXiv:1809.02665  [pdf

    cs.LG eess.AS stat.ML

    DreamNLP: Novel NLP System for Clinical Report Metadata Extraction using Count Sketch Data Streaming Algorithm: Preliminary Results

    Authors: Sanghyun Choi, Nikita Ivkin, Vladimir Braverman, Michael A. Jacobs

    Abstract: Extracting information from electronic health records (EHR) is a challenging task since it requires prior knowledge of the reports and some natural language processing algorithm (NLP). With the growing number of EHR implementations, such knowledge is increasingly challenging to obtain in an efficient manner. We address this challenge by proposing a novel methodology to analyze large sets of EHRs u… ▽ More

    Submitted 25 August, 2018; originally announced September 2018.

    Comments: 13 pages, 3 figures, US patent

    ACM Class: E.1; E.2; F.2.2; I.2.7

  8. arXiv:1603.00759  [pdf, ps, other

    cs.DS

    BPTree: an $\ell_2$ heavy hitters algorithm using constant memory

    Authors: Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, David P. Woodruff

    Abstract: The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. One is given a list $i_1,i_2,\ldots,i_m\in[n]$ and the goal is to identify the items among $[n]$ that appear frequently in the list. In sub-polynomial space, the strongest guarantee available is the $\ell_2$ guarantee, which requires finding all items that occur at least $ε\|f\|_2$ tim… ▽ More

    Submitted 9 November, 2017; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: v4: PODS'17 camera-ready version, includes improved space l_2 tracking (by log(1/epsilon) factor); v3: fixed accidental mis-sorting of author last names; v2: added section explaining why pick-and-drop sampling fails for l2 heavy hitters, and fixed minor typos

  9. arXiv:1511.00661  [pdf, ps, other

    cs.DS

    Beating CountSketch for Heavy Hitters in Insertion Streams

    Authors: Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff

    Abstract: Given a stream $p_1, \ldots, p_m$ of items from a universe $\mathcal{U}$, which, without loss of generality we identify with the set of integers $\{1, 2, \ldots, n\}$, we consider the problem of returning all $\ell_2$-heavy hitters, i.e., those items $j$ for which $f_j \geq ε\sqrt{F_2}$, where $f_j$ is the number of occurrences of item $j$ in the stream, and $F_2 = \sum_{i \in [n]} f_i^2$. Such a… ▽ More

    Submitted 2 November, 2015; originally announced November 2015.