Search | arXiv e-print repository

LaMDA: Language Models for Dialog Applications

Authors: Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia **, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yan** Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao , et al. (35 additional authors not shown)

Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotat… ▽ More We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. △ Less

Submitted 10 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2001.09977 [pdf, other]

Towards a Human-like Open-Domain Chatbot

Authors: Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

Abstract: We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation.… ▽ More We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated. △ Less

Submitted 27 February, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

Comments: 38 pages, 12 figures

arXiv:1903.08132 [pdf, other]

doi 10.1145/3299869.3314048

ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)

Authors: Vimalkumar Jeyakumar, Omid Madani, Ali Parandeh, Ashutosh Kulshreshtha, Weifei Zeng, Navindra Yadav

Abstract: We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses, reducing the number of causal dependencies from hundreds of tho… ▽ More We present ExplainIt!, a declarative, unsupervised root-cause analysis engine that uses time series monitoring data from large complex systems such as data centres. ExplainIt! empowers operators to succinctly specify a large number of causal hypotheses to search for causes of interesting events. ExplainIt! then ranks these hypotheses, reducing the number of causal dependencies from hundreds of thousands to a handful for human understanding. We show how a declarative language, such as SQL, can be effective in declaratively enumerating hypotheses that probe the structure of an unknown probabilistic graphical causal model of the underlying system. Our thesis is that databases are in a unique position to enable users to rapidly explore the possible causal mechanisms in data collected from diverse sources. We empirically demonstrate how ExplainIt! had helped us resolve over 30 performance issues in a commercial product since late 2014, of which we discuss a few cases in detail. △ Less

Submitted 22 March, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: SIGMOD Industry Track 2019

arXiv:1811.00442 [pdf, other]

doi 10.1103/PhysRevB.99.104201

Approximating observables on eigenstates of large many-body localized systems

Authors: Abishek K. Kulshreshtha, Arijeet Pal, Thorsten B. Wahl, Steven H. Simon

Abstract: Eigenstates of fully many-body localized (FMBL) systems can be organized into spin algebras based on quasilocal operators called l-bits. These spin algebras define quasilocal l-bit measurement ($τ^z_i$) and l-bit flip ($τ^x_i$) operators. For a disordered Heisenberg spin chain in the MBL regime we approximate l-bit flip operators by finding them exactly on small windows of systems and extending th… ▽ More Eigenstates of fully many-body localized (FMBL) systems can be organized into spin algebras based on quasilocal operators called l-bits. These spin algebras define quasilocal l-bit measurement ($τ^z_i$) and l-bit flip ($τ^x_i$) operators. For a disordered Heisenberg spin chain in the MBL regime we approximate l-bit flip operators by finding them exactly on small windows of systems and extending them onto the whole system by exploiting their quasilocal nature. We subsequently use these operators to represent approximate eigenstates. We then describe a method to calculate products of local observables on these eigenstates for systems of size $L$ in $O(L^2)$ time. This algorithm is used to compute the error of the approximate eigenstates. △ Less

Submitted 10 February, 2020; v1 submitted 1 November, 2018; originally announced November 2018.

Comments: 10 pages, 7 figures, added references

Journal ref: Phys. Rev. B 99, 104201 (2019)

arXiv:1707.05362 [pdf, other]

doi 10.1103/PhysRevB.98.184201

Behavior of l-bits near the many-body localization transition

Authors: Abishek K. Kulshreshtha, Arijeet Pal, Thorsten B. Wahl, Steven H. Simon

Abstract: Eigenstates of fully many-body localized (FMBL) systems are described by quasilocal operators $τ_i^z$ (l-bits), which are conserved exactly under Hamiltonian time evolution. The algebra of the operators $τ_i^z$ and $τ_i^x$ associated with l-bits ($\boldsymbolτ_i$) completely defines the eigenstates and the matrix elements of local operators between eigenstates at all energies. We develop a non-per… ▽ More Eigenstates of fully many-body localized (FMBL) systems are described by quasilocal operators $τ_i^z$ (l-bits), which are conserved exactly under Hamiltonian time evolution. The algebra of the operators $τ_i^z$ and $τ_i^x$ associated with l-bits ($\boldsymbolτ_i$) completely defines the eigenstates and the matrix elements of local operators between eigenstates at all energies. We develop a non-perturbative construction of the full set of l-bit algebras in the many-body localized phase for the canonical model of MBL. Our algorithm to construct the Pauli-algebra of l-bits combines exact diagonalization and a tensor network algorithm developed for efficient diagonalization of large FMBL Hamiltonians. The distribution of localization lengths of the l-bits is evaluated in the MBL phase and used to characterize the MBL-to-thermal transition. △ Less

Submitted 10 February, 2020; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: 5+3 pages, 6 Figures, added results on finite size scaling and thermal-quantum critical crossover, additional references

Journal ref: Phys. Rev. B 98, 184201 (2018)

Showing 1–5 of 5 results for author: Kulshreshtha, A