Skip to main content

Showing 1–16 of 16 results for author: Iter, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2310.13127  [pdf, other

    cs.CL

    Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

    Authors: Zhihan Zhang, Shuohang Wang, Wenhao Yu, Yichong Xu, Dan Iter, Qingkai Zeng, Yang Liu, Chenguang Zhu, Meng Jiang

    Abstract: Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of these instructions, and manually writing effective instructions for each task is a laborious and subjective process. In this paper, we introduce Auto-Instruct, a… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings. Work was done before July 2023

  3. arXiv:2310.12418  [pdf, other

    cs.CL

    The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

    Authors: Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

    Abstract: Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine requirements of human users. This paper provides a comprehensive analysis of the divergence between current NLP research and the needs of real-world NLP applicati… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  4. arXiv:2305.14726  [pdf, other

    cs.CL cs.AI

    In-Context Demonstration Selection with Cross Entropy Difference

    Authors: Dan Iter, Reid Pryzant, Ruochen Xu, Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu

    Abstract: Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks. However, selecting the best in-context examples is challenging because model performance can vary widely depending on the selected examples. We present a cross-entropy difference (CED) method for selecting in-context demonstrations. Our method is based on the observation that the effectiveness… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  5. arXiv:2305.13086  [pdf, other

    cs.CL

    LMGQS: A Large-scale Dataset for Query-focused Summarization

    Authors: Ruochen Xu, Song Wang, Yang Liu, Shuohang Wang, Yichong Xu, Dan Iter, Chenguang Zhu, Michael Zeng

    Abstract: Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has hindered model development in this area. In contrast, multiple large-scale high-quality datasets for generic summarization exist. We hypothesize that there is a hidde… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: work in progress

  6. arXiv:2305.13083  [pdf, other

    cs.CL

    InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

    Authors: Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang Zhu, Michael Zeng

    Abstract: While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies have found that although automatic metrics tend to favor smaller fine-tuned models, the quality of the summaries they generate is inferior to that of larger mode… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: work in progress

  7. arXiv:2305.03495  [pdf, other

    cs.CL cs.AI cs.LG

    Automatic Prompt Optimization with "Gradient Descent" and Beam Search

    Authors: Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng

    Abstract: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired by numerical gradient descent to automatically improve prompts, assuming acc… ▽ More

    Submitted 19 October, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  8. arXiv:2303.16634  [pdf, other

    cs.CL cs.AI

    G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

    Authors: Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu

    Abstract: The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity. Recent studies suggest using large language models (LLMs) as reference-free metrics for NLG eva… ▽ More

    Submitted 23 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  9. arXiv:2302.11521  [pdf, other

    cs.CL

    How Does In-Context Learning Help Prompt Tuning?

    Authors: Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, Mohit Iyyer

    Abstract: Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale. This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model, and in-context learning (ICL), in which demonstrations of the task are provided to the model in natural language without any… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  10. arXiv:2209.10063  [pdf, other

    cs.CL cs.AI

    Generate rather than Retrieve: Large Language Models are Strong Context Generators

    Authors: Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang

    Abstract: Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents.… ▽ More

    Submitted 25 January, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted at ICLR 2023 (v3, add code and implementation details)

  11. Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

    Authors: William Held, Dan Iter, Dan Jurafsky

    Abstract: Performing event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n^2$ pairwise comparisons. Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference, common in many applications. As a result cross-document coreference algorithm… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 9 pages, 8 figures, To be published in the 2021 Main Conference on Empirical Methods in Natural Language Processing

  12. arXiv:2109.10274  [pdf, other

    cs.CL

    The Trade-offs of Domain Adaptation for Neural Language Models

    Authors: David Grangier, Dan Iter

    Abstract: This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set depends on the size of the sets and the distance between their underlying distributions. We analyze how out-of-domain pre-training before in-domain fine-tuning achiev… ▽ More

    Submitted 21 March, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022

  13. arXiv:2109.07591  [pdf, other

    cs.CL cs.LG

    On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

    Authors: Dan Iter, David Grangier

    Abstract: Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experim… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  14. arXiv:2005.10389  [pdf, other

    cs.CL

    Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

    Authors: Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

    Abstract: Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: AC2020

  15. arXiv:1610.08123  [pdf, other

    cs.LG stat.ML

    Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

    Authors: Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

    Abstract: A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy param… ▽ More

    Submitted 28 September, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: 4 figures; 18 pages

  16. arXiv:1606.04487  [pdf, other

    cs.DC cs.LG

    Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

    Authors: Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré

    Abstract: We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over sta… ▽ More

    Submitted 19 October, 2016; v1 submitted 14 June, 2016; originally announced June 2016.

    ACM Class: I.2.6