Search | arXiv e-print repository

Banishing LLM Hallucinations Requires Rethinking Generalization

Authors: Johnny Li, Saksham Consul, Eda Zhou, James Wong, Naila Farooqui, Yuxin Ye, Nithyashree Manohar, Zhuxiaona Wei, Tian Wu, Ben Echols, Sharon Zhou, Gregory Diamos

Abstract: Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional a… ▽ More Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2005.01858 [pdf]

doi 10.1021/acs.macromol.0c01057

Effect of Confinement on Solvent-driven Infiltration of Polymer (SIP) into Nanoparticle Packings

Authors: Neha Manohar, Kathleen J. Stebe, Daeyeon Lee

Abstract: Nanocomposite films containing a high volume fraction (> 50vol%) of nanoparticles (NPs) in a polymer matrix are promising for their functionality and use as structural coatings, and also provide a unique platform to understand polymer behavior under strong confinement. Previously, we developed a novel technique to fabricate such nanocomposites at room temperature using solvent-driven infiltration… ▽ More Nanocomposite films containing a high volume fraction (> 50vol%) of nanoparticles (NPs) in a polymer matrix are promising for their functionality and use as structural coatings, and also provide a unique platform to understand polymer behavior under strong confinement. Previously, we developed a novel technique to fabricate such nanocomposites at room temperature using solvent-driven infiltration of polymer (SIP) into NP packings. In the SIP process, a bilayer made of an underlying polymer film and a dense packing of NPs is exposed to solvent vapor which induces condensation of the solvent into the voids of the packing. The condensed solvent plasticizes the underlying polymer film, inducing polymer infiltration into the solvent-filled voids in the NP packing. In this work, we study the effect of confinement on the kinetics of SIP and the final partitioning of polymer into the interstices of the NP packing. We find that, while the dynamics of infiltration during SIP are strongly dependent on confinement, the final extent of infiltration is independent of confinement. The time for infiltration obeys a power law with confinement, as defined by the ratio of the chain size and the pore size. Qualitatively, the observed time scale is attributed to changes in concentration regimes as infiltration proceeds, which lead to shifting characteristic length scales in the system over time. When the concentration in the pore exceeds the critical overlap concentration, the characteristic length scale of the polymer is no longer that of the entire chain, but rather the correlation length, which is smaller than the pore size. Therefore, at long times, the extent of infiltration is independent of the confinement ratio. Furthermore, favorable surface interactions between the polymer and the nanoparticles enhance partitioning into the NP packing. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 8 pages, 5 figures. Supporting Information: 4 pages, 3 figures

Journal ref: Macromolecules 53(15) 2020 6740-6746

arXiv:1708.08424 [pdf, other]

T/Key: Second-Factor Authentication From Secure Hash Chains

Authors: Dmitry Kogan, Nathan Manohar, Dan Boneh

Abstract: Time-based one-time password (TOTP) systems in use today require storing secrets on both the client and the server. As a result, an attack on the server can expose all second factors for all users in the system. We present T/Key, a time-based one-time password system that requires no secrets on the server. Our work modernizes the classic S/Key system and addresses the challenges in making such a s… ▽ More Time-based one-time password (TOTP) systems in use today require storing secrets on both the client and the server. As a result, an attack on the server can expose all second factors for all users in the system. We present T/Key, a time-based one-time password system that requires no secrets on the server. Our work modernizes the classic S/Key system and addresses the challenges in making such a system secure and practical. At the heart of our construction is a new lower bound analyzing the hardness of inverting hash chains composed of independent random functions, which formalizes the security of this widely used primitive. Additionally, we develop a near-optimal algorithm for quickly generating the required elements in a hash chain with little memory on the client. We report on our implementation of T/Key as an Android application. T/Key can be used as a replacement for current TOTP systems, and it remains secure in the event of a server-side compromise. The cost, as with S/Key, is that one-time passwords are longer than the standard six characters used in TOTP. △ Less

Submitted 28 August, 2017; originally announced August 2017.

Comments: Accepted to ACM CCS 2017

arXiv:1609.01829 [pdf]

doi 10.1016/j.procs.2015.03.156

Animal Classification System: A Block Based Approach

Authors: Y H Sharath Kumar, Manohar N, H K Chethan

Abstract: In this work, we propose a method for the classification of animal in images. Initially, a graph cut based method is used to perform segmentation in order to eliminate the background from the given image. The segmented animal images are partitioned in to number of blocks and then the color texture moments are extracted from different blocks. Probabilistic neural network and K-nearest neighbors are… ▽ More In this work, we propose a method for the classification of animal in images. Initially, a graph cut based method is used to perform segmentation in order to eliminate the background from the given image. The segmented animal images are partitioned in to number of blocks and then the color texture moments are extracted from different blocks. Probabilistic neural network and K-nearest neighbors are considered here for classification. To corroborate the efficacy of the proposed method, an experiment was conducted on our own data set of 25 classes of animals, which consisted of 4000 sample images. The experiment was conducted by picking images randomly from the database to study the effect of classification accuracy, and the results show that the K-nearest neighbors classifier achieves good performance. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: 8 pages, 2 figures, 3 tables

ACM Class: I.4.6; I.4.8

Journal ref: Procedia Computer Science, Volume 45, 2015, Pages 336-343

Showing 1–4 of 4 results for author: Manohar, N