Skip to main content

Showing 1–6 of 6 results for author: Ginart, A

.
  1. arXiv:2201.10774  [pdf, other

    cs.LG

    Competition over data: how does data purchase affect users?

    Authors: Yongchan Kwon, Antonio Ginart, James Zou

    Abstract: As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment… ▽ More

    Submitted 13 July, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

  2. arXiv:2201.00971  [pdf, other

    cs.LG cs.AI cs.CL

    Submix: Practical Private Prediction for Large-Scale Language Models

    Authors: Antonio Ginart, Laurens van der Maaten, James Zou, Chuan Guo

    Abstract: Recent data-extraction attacks have exposed that language models can memorize some training samples verbatim. This is a vulnerability that can compromise the privacy of the model's training data. In this work, we introduce SubMix: a practical protocol for private next-token prediction designed to prevent privacy violations by language models that were fine-tuned on a private corpus after pre-train… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  3. arXiv:2104.13621  [pdf, other

    cs.LG stat.ML

    MLDemon: Deployment Monitoring for Machine Learning Systems

    Authors: Antonio Ginart, Martin Zhang, James Zou

    Abstract: Post-deployment monitoring of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution. Here we propose a novel approach, MLDemon, for ML DEployment MONitoring. MLDemon integrates both unlabeled data and a small amount of on-demand labels to produce a real-time estimate of the ML model's current performance on a given data stream. Sub… ▽ More

    Submitted 24 February, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: Accepted to AISTATS 2022. Significant changes to algorithm, theory, and experiments since previous versions

  4. arXiv:2009.06797  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Competing AI: How does competition feedback affect machine learning?

    Authors: Antonio Ginart, Eva Zhang, Yongchan Kwon, James Zou

    Abstract: This papers studies how competition affects machine learning (ML) predictors. As ML becomes more ubiquitous, it is often deployed by companies to compete over customers. For example, digital platforms like Yelp use ML to predict user preference and make recommendations. A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more like… ▽ More

    Submitted 25 March, 2021; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Accepted to AISTATS 2021

  5. arXiv:1909.11810  [pdf, other

    cs.LG stat.ML

    Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

    Authors: Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

    Abstract: Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings. To help manage this outsized memory consumption, we explore mixed dimension embeddings, an embedding layer architecture in which a particular embedding vector's dimension scales with its que… ▽ More

    Submitted 8 February, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

  6. arXiv:1907.05012  [pdf, other

    cs.LG stat.ML

    Making AI Forget You: Data Deletion in Machine Learning

    Authors: Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

    Abstract: Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort. In this paper we initiate a framework studying what to do when it is no longer permissible to deploy models derivative from specific user data. In particular, we formulate the problem of efficientl… ▽ More

    Submitted 4 November, 2019; v1 submitted 11 July, 2019; originally announced July 2019.

    Comments: To appear in NeurIPS 2019