Skip to main content

Showing 1–22 of 22 results for author: Hashimoto, T B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04475  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

    Authors: Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto

    Abstract: LLM-based auto-annotators have become a key component of the LLM development process due to their cost-effectiveness and scalability compared to human-based evaluation. However, these auto-annotators can introduce complex biases that are hard to remove. Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics. We propose a simple regressi… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2312.00364  [pdf, other

    cs.LG cs.CV

    Benchmarking Multi-Domain Active Learning on Image Classification

    Authors: Jiayi Li, Rohan Taori, Tatsunori B. Hashimoto

    Abstract: Active learning aims to enhance model performance by strategically labeling informative data points. While extensively studied, its effectiveness on large-scale, real-world datasets remains underexplored. Existing research primarily focuses on single-source data, ignoring the multi-domain nature of real-world data. We introduce a multi-domain active learning benchmark to bridge this gap. Our bench… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  3. arXiv:2310.17623  [pdf, other

    cs.CL cs.LG

    Proving Test Set Contamination in Black Box Language Models

    Authors: Yonatan Oren, Nicole Meister, Niladri Chatterji, Faisal Ladhak, Tatsunori B. Hashimoto

    Abstract: Large language models are trained on vast amounts of internet data, prompting concerns and speculation that they have memorized public benchmarks. Going from speculation to proof of contamination is challenging, as the pretraining data used by proprietary models are often not publicly accessible. We show that it is possible to provide provable guarantees of test set contamination in language model… ▽ More

    Submitted 23 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

  4. arXiv:2307.03576  [pdf, ps, other

    cs.LG

    One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

    Authors: Arvind Mahankali, Tatsunori B. Hashimoto, Tengyu Ma

    Abstract: Recent works have empirically analyzed in-context learning and shown that transformers trained on synthetic linear regression tasks can learn to implement ridge regression, which is the Bayes-optimal predictor, given sufficient capacity [Akyürek et al., 2023], while one-layer transformers with linear self-attention and no MLP layer will learn to implement one step of gradient descent (GD) on a lea… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  5. arXiv:2305.18619  [pdf, other

    cs.CL cs.LG

    Likelihood-Based Diffusion Language Models

    Authors: Ishaan Gulrajani, Tatsunori B. Hashimoto

    Abstract: Despite a growing interest in diffusion-based language models, existing work has not shown that these models can attain nontrivial likelihoods on standard language modeling benchmarks. In this work, we take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms a smal… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  6. arXiv:2305.14387  [pdf, other

    cs.LG cs.AI cs.CL

    AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

    Authors: Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

    Abstract: Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Develo** these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy eva… ▽ More

    Submitted 7 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Spotlight at NeurIPS 2023

  7. arXiv:2301.13848  [pdf, other

    cs.CL cs.AI cs.LG

    Benchmarking Large Language Models for News Summarization

    Authors: Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

    Abstract: Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. S… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  8. arXiv:2211.16490  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    Coder Reviewer Reranking for Code Generation

    Authors: Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang

    Abstract: Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  9. arXiv:2209.03942  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Data Feedback Loops: Model-driven Amplification of Dataset Biases

    Authors: Rohan Taori, Tatsunori B. Hashimoto

    Abstract: Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as trai… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  10. arXiv:2205.14217  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion-LM Improves Controllable Text Generation

    Authors: Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, Tatsunori B. Hashimoto

    Abstract: Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  11. arXiv:2205.11055  [pdf, other

    cs.CL cs.LG

    TempLM: Distilling Language Models into Template-Based Generators

    Authors: Tianyi Zhang, Mina Lee, Lisa Li, Ende Shen, Tatsunori B. Hashimoto

    Abstract: While pretrained language models (PLMs) have greatly improved text generation, they have also been known to produce unfaithful or inappropriate content. In contrast, classic template-based systems provide strong guarantees of faithfulness at the cost of fluency. We propose TempLM, which achieves the best of both worlds by distilling a PLM into a template-based generator. On the E2E and SynthBio da… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  12. arXiv:1911.08731  [pdf, other

    cs.LG stat.ML

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

    Authors: Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find… ▽ More

    Submitted 2 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

  13. arXiv:1911.06964  [pdf, other

    cs.CL cs.LG

    Learning Autocomplete Systems as a Communication Game

    Authors: Mina Lee, Tatsunori B. Hashimoto, Percy Liang

    Abstract: We study textual autocomplete---the task of predicting a full sentence from a partial sentence---as a human-machine communication game. Specifically, we consider three competing goals for effective communication: use as few tokens as possible (efficiency), transmit sentences faithfully (accuracy), and be learnable to humans (interpretability). We propose an unsupervised approach which tackles all… ▽ More

    Submitted 16 November, 2019; originally announced November 2019.

  14. arXiv:1909.02060  [pdf, other

    cs.CL cs.LG stat.ML

    Distributionally Robust Language Modeling

    Authors: Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the k… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Camera ready version for EMNLP

  15. arXiv:1904.02792  [pdf, other

    cs.CL cs.AI stat.ML

    Unifying Human and Statistical Evaluation for Natural Language Generation

    Authors: Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

    Abstract: How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be in… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: NAACL Camera Ready Submission

  16. arXiv:1812.01194  [pdf, other

    stat.ML cs.LG

    A Retrieve-and-Edit Framework for Predicting Structured Outputs

    Authors: Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

    Abstract: For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method f… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: To appear, NeurIPS 2018

  17. arXiv:1806.08010  [pdf, other

    stat.ML cs.LG

    Fairness Without Demographics in Repeated Loss Minimization

    Authors: Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

    Abstract: Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e.g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo… ▽ More

    Submitted 30 July, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: Final version for ICML2018, corrects typos

  18. arXiv:1804.03761  [pdf, other

    stat.ML cs.LG

    Derivative free optimization via repeated classification

    Authors: Tatsunori B. Hashimoto, Steve Yadlowsky, John C. Duchi

    Abstract: We develop an algorithm for minimizing a function using $n$ batched function value measurements at each of $T$ rounds by using classifiers to identify a function's sublevel set. We show that sufficiently accurate classifiers can achieve linear convergence rates, and show that the convergence rate is tied to the difficulty of active learning sublevel sets. Further, we show that the bootstrap is a c… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: At AISTATS2018

  19. arXiv:1709.08878  [pdf, other

    cs.CL cs.AI cs.LG cs.NE stat.ML

    Generating Sentences by Editing Prototypes

    Authors: Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

    Abstract: We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to hu… ▽ More

    Submitted 7 September, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

    Comments: 14 pages, Transactions of the Association for Computational Linguistics (TACL), 2018

  20. arXiv:1511.00573  [pdf, other

    stat.ML cs.AI cs.SI

    From random walks to distances on unweighted graphs

    Authors: Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

    Abstract: Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing r… ▽ More

    Submitted 2 November, 2015; originally announced November 2015.

    Comments: To appear in NIPS 2015

  21. arXiv:1509.05808  [pdf, other

    cs.CL cs.LG stat.ML

    Word, graph and manifold embedding from Markov processes

    Authors: Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola

    Abstract: Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitive-psychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric recov… ▽ More

    Submitted 18 September, 2015; originally announced September 2015.

  22. arXiv:1411.5720  [pdf, other

    stat.ML cs.SI math.ST stat.ME

    Metric recovery from directed unweighted graphs

    Authors: Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

    Abstract: We analyze directed, unweighted graphs obtained from $x_i\in \mathbb{R}^d$ by connecting vertex $i$ to $j$ iff $|x_i - x_j| < ε(x_i)$. Examples of such graphs include $k$-nearest neighbor graphs, where $ε(x_i)$ varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric $ε(x_i)$ and the associated den… ▽ More

    Submitted 20 November, 2014; originally announced November 2014.

    Comments: Poster at NIPS workshop on networks. Submitted to AISTATS 2015