Skip to main content

Showing 1–10 of 10 results for author: Rajendran, G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.18400  [pdf, other

    cs.CL cs.LG stat.ML

    Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

    Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering their factual meanings. These findings highlight that LLMs might behave like an associative memory model where certain tokens in the contexts serve as clues to… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2405.15084  [pdf, other

    cs.DS cs.LG stat.ML

    Efficient Certificates of Anti-Concentration Beyond Gaussians

    Authors: Ainesh Bakshi, Pravesh Kothari, Goutham Rajendran, Madhur Tulsiani, Aravindan Vijayaraghavan

    Abstract: A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificate… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2403.03867  [pdf, other

    cs.CL cs.LG stat.ML

    On the Origins of Linear Representations in Large Language Models

    Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam, Victor Veitch

    Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction ob… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2402.09236  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

    Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

    Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 36 pages

  5. arXiv:2311.18048  [pdf, other

    cs.LG cs.CE eess.SY stat.ME

    An Interventional Perspective on Identifiability in Gaussian LTI Systems with Independent Component Analysis

    Authors: Goutham Rajendran, Patrik Reizinger, Wieland Brendel, Pradeep Ravikumar

    Abstract: We investigate the relationship between system identification and intervention design in dynamical systems. While previous research demonstrated how identifiable representation learning methods, such as Independent Component Analysis (ICA), can reveal cause-effect relationships, it relied on a passive perspective without considering how to collect data. Our work shows that in Gaussian Linear Time-… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CLeaR2024 camera ready. Code available at https://github.com/rpatrik96/lti-ica

  6. arXiv:2306.02235  [pdf, other

    cs.LG cs.AI math.ST stat.ME stat.ML

    Learning Linear Causal Representations from Interventions under General Nonlinear Mixing

    Authors: Simon Buchholz, Goutham Rajendran, Elan Rosenfeld, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

    Abstract: We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown single-node interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker cl… ▽ More

    Submitted 18 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted as Oral paper at NeurIPS 2023

  7. arXiv:2302.04462  [pdf, other

    cs.CC cs.DS cs.LG stat.ML

    Nonlinear Random Matrices and Applications to the Sum of Squares Hierarchy

    Authors: Goutham Rajendran

    Abstract: We develop new tools in the theory of nonlinear random matrices and apply them to study the performance of the Sum of Squares (SoS) hierarchy on average-case problems. The SoS hierarchy is a powerful optimization technique that has achieved tremendous success for various problems in combinatorial optimization, robust statistics and machine learning. It's a family of convex relaxations that lets… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Dissertation, University of Chicago

  8. arXiv:2206.10044  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Identifiability of deep generative models without auxiliary information

    Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Specifically, we show that for a broad class of generativ… ▽ More

    Submitted 18 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: 34 pages, 9 figures, to appear in NeurIPS 2022

  9. arXiv:2110.04719  [pdf, other

    cs.LG cs.AI stat.ML

    Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families

    Authors: Goutham Rajendran, Bohdan Kivva, Ming Gao, Bryon Aragam

    Abstract: Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy scor… ▽ More

    Submitted 28 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021; 27 pages, 9 figures

  10. arXiv:2106.15563  [pdf, other

    cs.LG cs.AI stat.ML

    Learning latent causal graphs via mixture oracles

    Authors: Bohdan Kivva, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam

    Abstract: We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant… ▽ More

    Submitted 21 November, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: To appear at NeurIPS 2021. 41 pages