Skip to main content

Showing 1–18 of 18 results for author: Kuncoro, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10616  [pdf, other

    cs.LG cs.CL

    DiPaCo: Distributed Path Composition

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam

    Abstract: Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high bandwidth communication between devices working in parallel. In this work, we propose a co-designed modular architecture and training approach for ML models, dubbed DIstributed PAth CO… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2311.08105  [pdf, other

    cs.LG cs.CL

    DiLoCo: Distributed Low-Communication Training of Language Models

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

    Abstract: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators,… ▽ More

    Submitted 2 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

  3. arXiv:2306.02870  [pdf, ps, other

    cs.CL

    On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

    Authors: Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom, Adhiguna Kuncoro

    Abstract: This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM research practices often conflate different possible sources of model improvement, without conducting proper ablation studies and principled comparisons between differ… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted at ACL 2023

  4. arXiv:2212.09686  [pdf, other

    cs.CL

    A Natural Bias for Language Generation Models

    Authors: Clara Meister, Wojciech Stokowiec, Tiago Pimentel, Lei Yu, Laura Rimell, Adhiguna Kuncoro

    Abstract: After just a few hundred training updates, a standard probabilistic model for language generation has likely not yet learnt many semantic or syntactic rules of natural language, making it difficult to estimate the probability distribution over next tokens. Yet around this point, these models have identified a simple, loss-minimising behaviour: to output the unigram distribution of the target train… ▽ More

    Submitted 23 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Main conference paper at ACL 2023

  5. Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

    Authors: Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

    Abstract: We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentenc… ▽ More

    Submitted 6 December, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 17 pages, 5 figures, 2 tables and 1 algorithm. To appear in TACL, to be presented at EMNLP 2022

  6. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  7. arXiv:2111.00607  [pdf, other

    cs.CL

    A Systematic Investigation of Commonsense Knowledge in Large Language Models

    Authors: Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh

    Abstract: Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, w… ▽ More

    Submitted 31 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted to EMNLP 2022

  8. arXiv:2104.08200  [pdf, other

    cs.CL

    IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

    Authors: Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, Pascale Fung

    Abstract: Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems. Unfortunately, the lack of publicly available NLG benchmarks for low-resource languages poses a challenging barrier for building NLG systems that work well for languages with limited amounts of data. Here we introduce IndoNLG, the first benchmark to measure natural language… ▽ More

    Submitted 9 October, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted in EMNLP 2021, 10 pages

  9. arXiv:2102.01951  [pdf, other

    cs.CL cs.AI

    Mind the Gap: Assessing Temporal Generalization in Neural Language Models

    Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

    Abstract: Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlap** time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language mode… ▽ More

    Submitted 26 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: To appear as a Spotlight at NeurIPS 2021

  10. arXiv:2005.13482  [pdf, other

    cs.CL

    Syntactic Structure Distillation Pretraining For Bidirectional Encoders

    Authors: Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

    Abstract: Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 tables, 2 figures. AK and LK contributed equally

  11. arXiv:1906.06438  [pdf, other

    cs.CL cs.LG

    Scalable Syntax-Aware Language Models Using Knowledge Distillation

    Authors: Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom

    Abstract: Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of tra… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  12. arXiv:1904.03746  [pdf, other

    cs.CL stat.ML

    Unsupervised Recurrent Neural Network Grammars

    Authors: Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis

    Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNG… ▽ More

    Submitted 4 August, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  13. arXiv:1806.04127  [pdf, other

    cs.CL

    Finding Syntax in Human Encephalography with Beam Search

    Authors: John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan R. Brennan

    Abstract: Recurrent neural network grammars (RNNGs) are generative models of (tree,string) pairs that rely on neural networks to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitud… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: ACL2018

  14. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages

  15. arXiv:1611.05774  [pdf, other

    cs.CL

    What Do Recurrent Neural Network Grammars Learn About Syntax?

    Authors: Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith

    Abstract: Recurrent neural network grammars (RNNG) are a recently proposed probabilistic generative modeling family for natural language. They show state-of-the-art language modeling and parsing performance. We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enabl… ▽ More

    Submitted 10 January, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

    Comments: 10 pages. To appear in EACL 2017, Valencia, Spain

  16. arXiv:1609.07561  [pdf, other

    cs.CL

    Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

    Authors: Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith

    Abstract: We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal o… ▽ More

    Submitted 23 September, 2016; originally announced September 2016.

    Comments: 10 pages. To appear at EMNLP 2016

  17. arXiv:1604.06529  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Dependency Parsing with LSTMs: An Empirical Evaluation

    Authors: Adhiguna Kuncoro, Yuichiro Sawai, Kevin Duh, Yuji Matsumoto

    Abstract: We propose a transition-based dependency parser using Recurrent Neural Networks with Long Short-Term Memory (LSTM) units. This extends the feedforward neural network parser of Chen and Manning (2014) and enables modelling of entire sequences of shift/reduce transition decisions. On the Google Web Treebank, our LSTM parser is competitive with the best feedforward parser on overall accuracy and nota… ▽ More

    Submitted 30 June, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

    Comments: 7 pages, 4 figures

  18. arXiv:1602.07776  [pdf, other

    cs.CL cs.NE

    Recurrent Neural Network Grammars

    Authors: Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith

    Abstract: We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art seque… ▽ More

    Submitted 12 October, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: Proceedings of NAACL 2016 (contains corrigendum)