Skip to main content

Showing 1–15 of 15 results for author: Bailis, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2107.12525  [pdf, ps, other

    math.ST cs.DB cs.LG stat.ML

    Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

    Abstract: Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where… ▽ More

    Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  2. arXiv:2102.08622  [pdf, other

    cs.LG stat.ML

    Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

    Authors: Kai Sheng Tai, Peter Bailis, Gregory Valiant

    Abstract: Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. Thi… ▽ More

    Submitted 11 June, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: ICML 2021 camera ready version

  3. arXiv:2008.09983  [pdf, other

    cs.LG cs.DB stat.ML

    Leveraging Organizational Resources to Adapt Models to New Data Modalities

    Authors: Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi

    Abstract: As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e.g., a new video content launch in a social media application requires existing text or image models to extend to video). To solve this problem, organizations typically create ML pipelines from scratch. However, this fails to utiliz… ▽ More

    Submitted 23 August, 2020; originally announced August 2020.

    Journal ref: PVLDB,13(12): 3396-3410, 2020

  4. arXiv:2007.00077  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Similarity Search for Efficient Active Learning and Search of Rare Concepts

    Authors: Cody Coleman, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter Bailis, Alexander C. Berg, Robert Nowak, Roshan Sumbaly, Matei Zaharia, I. Zeki Yalniz

    Abstract: Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for la… ▽ More

    Submitted 22 July, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

  5. arXiv:2006.03779  [pdf, other

    stat.ML cs.LG

    Chromatic Learning for Sparse Datasets

    Authors: Vladimir Feinberg, Peter Bailis

    Abstract: Learning over sparse, high-dimensional data frequently necessitates the use of specialized methods such as the hashing trick. In this work, we design a highly scalable alternative approach that leverages the low degree of feature co-occurrences present in many practical settings. This approach, which we call Chromatic Learning (CL), obtains a low-dimensional dense feature representation by perform… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: 15 pages, 8 figures, under review

  6. arXiv:1910.01500  [pdf, other

    cs.LG cs.PF stat.ML

    MLPerf Training Benchmark

    Authors: Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan , et al. (12 additional authors not shown)

    Abstract: Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits h… ▽ More

    Submitted 2 March, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: MLSys 2020

  7. arXiv:1906.11829  [pdf, other

    cs.LG stat.ML

    Selection via Proxy: Efficient Data Selection for Deep Learning

    Authors: Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

    Abstract: Data selection methods, such as active learning and core-set selection, are useful tools for machine learning on large datasets. However, they can be prohibitively expensive to apply in deep learning because they depend on feature representations that need to be learned. In this work, we show that we can greatly improve the computational efficiency by using a small proxy model to perform data sele… ▽ More

    Submitted 26 October, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: ICLR 2020

  8. arXiv:1905.02304  [pdf, other

    cs.LG cs.DB stat.ML

    CrossTrainer: Practical Domain Adaptation with Loss Reweighting

    Authors: Justin Chen, Edward Gan, Kexin Rong, Sahaana Suri, Peter Bailis

    Abstract: Domain adaptation provides a powerful set of model training techniques given domain-specific training data and supplemental data with unknown relevance. The techniques are useful when users need to develop models with data from varying sources, of varying quality, or from different time ranges. We build CrossTrainer, a system for practical domain adaptation. CrossTrainer utilizes loss reweighting,… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  9. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  10. arXiv:1901.11399  [pdf, other

    cs.CV cs.LG stat.ML

    Equivariant Transformer Networks

    Authors: Kai Sheng Tai, Peter Bailis, Gregory Valiant

    Abstract: How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network? We propose Equivariant Transformers (ETs), a family of differentiable image-to-image map**s that improve the robustness of models towards pre-defined continuous transformation groups. Through the use of specially-derived canonical coordinate systems, ETs incorporate fu… ▽ More

    Submitted 24 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: ICML 2019

  11. arXiv:1810.01937  [pdf, other

    cs.LG cs.AI stat.ML

    LIT: Block-wise Intermediate Representation Training for Model Compression

    Authors: Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

    Abstract: Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model. Hint training (i.e., FitNets) extends KD by regressing a student model's intermediate representation to a teacher model's intermediate representation. In this work, we introduce bLock-wise Interme… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

  12. arXiv:1806.01427  [pdf, other

    cs.LG stat.ML

    Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

    Authors: Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

    Abstract: Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of… ▽ More

    Submitted 1 December, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

  13. arXiv:1711.02305  [pdf, other

    cs.LG cs.DS stat.ML

    Sketching Linear Classifiers over Data Streams

    Authors: Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

    Abstract: We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and stream… ▽ More

    Submitted 6 April, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: Full version of paper appearing at SIGMOD 2018 with more detailed proofs of theoretical results. Code available at https://github.com/stanford-futuredata/wmsketch

  14. arXiv:1706.08146  [pdf, other

    cs.LG cs.AI stat.ML

    Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

    Authors: Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant

    Abstract: What learning algorithms can be run directly on compressively-sensed data? In this work, we consider the question of accurately and efficiently computing low-rank matrix or tensor factorizations given data compressed via random projections. We examine the approach of first performing factorization in the compressed domain, and then reconstructing the original high-dimensional factors from the reco… ▽ More

    Submitted 27 May, 2019; v1 submitted 25 June, 2017; originally announced June 2017.

    Comments: Updates for ICML'19 camera-ready

  15. arXiv:1705.07538  [pdf, other

    cs.LG cs.DB stat.ML

    Infrastructure for Usable Machine Learning: The Stanford DAWN Project

    Authors: Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

    Abstract: Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application developmen… ▽ More

    Submitted 8 June, 2017; v1 submitted 21 May, 2017; originally announced May 2017.