Skip to main content

Showing 1–11 of 11 results for author: Haochen, J Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.16361  [pdf, ps, other

    cs.LG

    Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

    Authors: Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow, Tengyu Ma

    Abstract: Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from pri… ▽ More

    Submitted 7 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Added result on projected gradient descent with inverse-polynomial learning rate

  2. arXiv:2305.17311  [pdf, other

    cs.CL cs.AI cs.LG

    Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

    Authors: Yuhui Zhang, Michihiro Yasunaga, Zheng** Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung

    Abstract: Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data. In this work, we introduce NeQA, a dataset consisting of questions with negation in which language models do not exhibit straightforward positive scaling. We show that this task can exhibit inverse scaling, U-shaped scaling, or positive scaling, and th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023 Findings

  3. arXiv:2302.04269  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Diagnosing and Rectifying Vision Models using Language

    Authors: Yuhui Zhang, Jeff Z. HaoChen, Shih-Cheng Huang, Kuan-Chieh Wang, James Zou, Serena Yeung

    Abstract: Recent multi-modal contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers, by leveraging the rich information in large-scale image-caption datasets. Our work highlights a distinct advantage of this multi-modal embedding space: the ability to diagnose vision classifiers through natural language. The traditional process o… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: Published at ICLR 2023

  4. arXiv:2211.14699  [pdf, other

    cs.LG stat.ML

    A Theoretical Study of Inductive Biases in Contrastive Learning

    Authors: Jeff Z. HaoChen, Tengyu Ma

    Abstract: Understanding self-supervised learning is important but challenging. Previous theoretical works study the role of pretraining losses, and view neural networks as general black boxes. However, the recent work of Saunshi et al. argues that the model architecture -- a component largely ignored by previous works -- also has significant influences on the downstream performance of self-supervised learni… ▽ More

    Submitted 8 April, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: ICLR 2023

  5. arXiv:2204.02683  [pdf, other

    cs.LG

    Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations

    Authors: Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma

    Abstract: Contrastive learning is a highly effective method for learning representations from unlabeled data. Recent works show that contrastive representations can transfer across domains, leading to simple state-of-the-art algorithms for unsupervised domain adaptation. In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target dom… ▽ More

    Submitted 23 May, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  6. arXiv:2204.00570  [pdf, other

    cs.LG cs.CV

    Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

    Authors: Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

    Abstract: We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e.g., photographs) and unlabeled data from a target domain (e.g., sketches) are used to learn a classifier for the target domain. Conventional UDA methods (e.g., domain adversarial training) learn domain-invariant features to improve generalization to the target domain. In this paper, we show that contrastiv… ▽ More

    Submitted 1 December, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: ICML 2022 (Long Talk)

  7. arXiv:2203.00089  [pdf, other

    cs.LG math.OC stat.ML

    Amortized Proximal Optimization

    Authors: Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

    Abstract: We propose a framework for online meta-optimization of parameters that govern optimization, called Amortized Proximal Optimization (APO). We first interpret various existing neural network optimizers as approximate stochastic proximal point methods which trade off the current-batch loss with proximity terms in both function space and weight space. The idea behind APO is to amortize the minimizatio… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

    Comments: 37 pages, 30 figures

  8. arXiv:2110.05025  [pdf, other

    cs.LG cs.CV stat.ML

    Self-supervised Learning is More Robust to Dataset Imbalance

    Authors: Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

    Abstract: Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experime… ▽ More

    Submitted 22 May, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

  9. arXiv:2106.04156  [pdf, other

    cs.LG stat.ML

    Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

    Authors: Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

    Abstract: Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while kee** negative pairs far apart. Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of… ▽ More

    Submitted 23 June, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted as an oral to NeurIPS 2021

  10. arXiv:2011.01418  [pdf, other

    cs.LG

    Meta-learning Transferable Representations with a Single Target Domain

    Authors: Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma

    Abstract: Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks. First, we aim to understand more about when and why fine-tuning and joint training can be suboptimal or even harmful for transfer learning. We design semi-synthetic datasets where the source task can be solved by either source-specific features… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  11. arXiv:2006.08680  [pdf, other

    cs.LG stat.ML

    Shape Matters: Understanding the Implicit Bias of the Noise Covariance

    Authors: Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

    Abstract: The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. This paper theore… ▽ More

    Submitted 17 June, 2020; v1 submitted 15 June, 2020; originally announced June 2020.