Skip to main content

Showing 1–50 of 65 results for author: Gu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  2. arXiv:2406.07887  [pdf, other

    cs.LG cs.CL

    An Empirical Study of Mamba-based Language Models

    Authors: Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent studies have shown that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative. In a contr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.07475  [pdf, other

    cs.LG stat.ML

    Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior

    Authors: Anming Gu, Edward Chien, Kristjan Greenewald

    Abstract: Trajectory inference seeks to recover the temporal dynamics of a population from snapshots of its (uncoupled) temporal marginals, i.e. where observed particles are not tracked over time. Lavenant et al. arXiv:2102.09204 addressed this challenging problem under a stochastic differential equation (SDE) model with a gradient-driven drift in the observed space, introducing a minimum entropy estimator… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 32 pages, 9 figures

  4. arXiv:2405.21060  [pdf, other

    cs.LG

    Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

    Authors: Tri Dao, Albert Gu

    Abstract: While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention,… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  5. arXiv:2405.18733  [pdf, other

    cs.AI

    Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning

    Authors: Noah Adhikari, Allen Gu

    Abstract: We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditiona… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2403.07974  [pdf, other

    cs.SE cs.CL cs.LG

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    Authors: Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

    Abstract: Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g., HumanEval, MBPP) are no longer sufficient for assessing their capabilities. In this work, we propose LiveCodeBench, a comprehensive and contaminati… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Website - https://livecodebench.github.io/

  7. arXiv:2403.03234  [pdf, other

    q-bio.GN cs.LG

    Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

    Authors: Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov

    Abstract: Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/kuleshov-group/caduceus

  8. arXiv:2402.19475  [pdf, other

    cs.SE cs.AI cs.LG

    The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

    Authors: Alex Gu, Wen-Ding Li, Naman Jain, Theo X. Olausson, Celine Lee, Koushik Sen, Armando Solar-Lezama

    Abstract: While language models are increasingly more proficient at code generation, they still frequently generate incorrect programs. Many of these programs are obviously wrong, but others are more subtle and pass weaker correctness checks such as being able to compile. In this work, we focus on these counterfeit samples: programs sampled from a language model that 1) have a high enough log-probability to… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 54 pages, 25 figures

  9. arXiv:2402.19427  [pdf, other

    cs.LG cs.CL

    Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

    Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

    Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 25 pages, 11 figures

  10. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  11. arXiv:2402.17911  [pdf, other

    quant-ph cond-mat.stat-mech cs.IT cs.LG

    Demonstration of Robust and Efficient Quantum Property Learning with Shallow Shadows

    Authors: Hong-Ye Hu, Andi Gu, Swarnadeep Majumder, Hang Ren, Yipei Zhang, Derek S. Wang, Yi-Zhuang You, Zlatko Minev, Susanne F. Yelin, Alireza Seif

    Abstract: Extracting information efficiently from quantum systems is a major component of quantum information processing tasks. Randomized measurements, or classical shadows, enable predicting many properties of arbitrary quantum states using few measurements. While random single qubit measurements are experimentally friendly and suitable for learning low-weight Pauli observables, they perform poorly for no… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 5 figures

  12. arXiv:2401.03065  [pdf, other

    cs.SE cs.AI cs.LG

    CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

    Authors: Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang

    Abstract: We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. First, we propose a generic recipe for generating our execution benchmark which can be used to create future variation of the benchmark. Second… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 71 pages, 29 figures

  13. arXiv:2312.00752  [pdf, other

    cs.LG cs.AI

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Authors: Albert Gu, Tri Dao

    Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  14. arXiv:2311.04986  [pdf, other

    cs.CV

    Exploiting Inductive Biases in Video Modeling through Neural CDEs

    Authors: Johnathan Chiu, Samuel Duffield, Max Hunter-Gordon, Kaelan Donatella, Max Aifer, Andi Gu

    Abstract: We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  15. arXiv:2310.16803  [pdf, other

    cs.CL cs.LG

    Language Agnostic Code Embeddings

    Authors: Saiteja Utpala, Alex Gu, Pin Yu Chen

    Abstract: Recently, code language models have achieved notable advancements in addressing a diverse array of essential code comprehension and generation tasks. Yet, the field lacks a comprehensive deep dive and understanding of the code embeddings of multilingual code models. In this paper, we present a comprehensive study on multilingual code embeddings, focusing on the cross-lingual capabilities of these… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  16. LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

    Authors: Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy

    Abstract: Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing… ▽ More

    Submitted 14 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP Main 2023 (Outstanding Paper Award)

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5153-5176, Singapore. Association for Computational Linguistics

  17. arXiv:2309.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Augmenting conformers with structured state-space sequence models for online speech recognition

    Authors: Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

    Abstract: Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. We performed systematic ablat… ▽ More

    Submitted 27 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024

  18. arXiv:2309.01941  [pdf, other

    q-bio.NC cs.AI cs.LG

    Dynamic Brain Transformer with Multi-level Attention for Functional Brain Network Analysis

    Authors: Xuan Kan, Antonio Aodong Chen Gu, Hejie Cui, Ying Guo, Carl Yang

    Abstract: Recent neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limite… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE BHI 2023

    MSC Class: 68T07; 68T05 ACM Class: I.2.6; J.3

  19. arXiv:2307.14433  [pdf, other

    cs.CV

    ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography

    Authors: Hooman Vaseli, Ang Nan Gu, S. Neda Ahmadi Amiri, Michael Y. Tsang, Andrea Fung, Nima Kondori, Armin Saadat, Purang Abolmaesumi, Teresa S. M. Tsang

    Abstract: Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography video… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: To be published in MICCAI 2023

  20. arXiv:2306.15626  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

    Authors: Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

    Abstract: Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an op… ▽ More

    Submitted 27 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023 (Datasets and Benchmarks Track) as an oral presentation. Data, code, and models available at https://leandojo.org/

  21. arXiv:2306.15228  [pdf, other

    cs.RO cs.AI

    IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

    Authors: Gaurav Datta, Ryan Hoque, Anrui Gu, Eugen Solowjow, Ken Goldberg

    Abstract: Imitation learning has been applied to a range of robotic tasks, but can struggle when robots encounter edge cases that are not represented in the training data (i.e., distribution shift). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human supervisors during task execution and learn from them over time, but different supervisors may demonstrate… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: CoRL 2023

  22. arXiv:2305.09967  [pdf, other

    cs.CV cs.LG

    Variable Length Embeddings

    Authors: Johnathan Chiu, Andi Gu, Matt Zhou

    Abstract: In this work, we introduce a novel deep learning architecture, Variable Length Embeddings (VLEs), an autoregressive model that can produce a latent representation composed of an arbitrary number of tokens. As a proof of concept, we demonstrate the capabilities of VLEs on tasks that involve reconstruction and image decomposition. We evaluate our experiments on a mix of the iNaturalist and ImageNet… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  23. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  24. arXiv:2304.11277  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

    Authors: Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

    Abstract: It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit tech… ▽ More

    Submitted 12 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  25. arXiv:2303.06349  [pdf, other

    cs.LG

    Resurrecting Recurrent Neural Networks for Long Sequences

    Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

    Abstract: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important diff… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 30 pages, 9 figures

  26. arXiv:2303.03982  [pdf, other

    cs.LG

    Structured State Space Models for In-Context Reinforcement Learning

    Authors: Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

    Abstract: Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allo… ▽ More

    Submitted 23 November, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

  27. arXiv:2301.11324  [pdf, other

    cs.LG

    Certified Interpretability Robustness for Class Activation Map**

    Authors: Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel

    Abstract: Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: 13 pages, 5 figures. Accepted to Machine Learning for Autonomous Driving Workshop at NeurIPS 2020

  28. arXiv:2301.10540  [pdf, other

    cs.CV

    Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN

    Authors: David M. Knigge, David W. Romero, Albert Gu, Efstratios Gavves, Erik J. Bekkers, Jakub M. Tomczak, Mark Hoogendoorn, Jan-Jakob Sonke

    Abstract: Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of arbitrary resolution, dimensionality and length w… ▽ More

    Submitted 16 April, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  29. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  30. arXiv:2212.10544  [pdf, other

    cs.CL cs.LG

    Pretraining Without Attention

    Authors: Junxiong Wang, **g Nathan Yan, Albert Gu, Alexander M. Rush

    Abstract: Transformers have been essential to pretraining success in NLP. While other architectures have been used, downstream accuracy is either significantly worse, or requires attention layers to match standard benchmarks such as GLUE. This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs). Our proposed model, Bidirectional Gated S… ▽ More

    Submitted 8 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  31. arXiv:2210.11468  [pdf, other

    cs.SE cs.HC cs.LG

    ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications

    Authors: Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama

    Abstract: We introduce ObSynth, an interactive system leveraging the domain knowledge embedded in large language models (LLMs) to help users design object models from high level natural language prompts. This is an example of specification reification, the process of taking a high-level, potentially vague specification and reifying it into a more concrete form. We evaluate ObSynth via a user study, leading… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 25 pages, 15 figures

  32. arXiv:2210.06583  [pdf, other

    cs.CV cs.LG eess.IV

    S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

    Authors: Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré

    Abstract: Visual data such as images and videos are typically modeled as discretizations of inherently continuous, multidimensional signals. Existing continuous-signal models attempt to exploit this fact by modeling the underlying signals of visual (e.g., image) data directly. However, these models have not yet been able to achieve competitive performance on practical vision tasks such as large-scale image… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  33. arXiv:2209.13706  [pdf, other

    cs.RO cs.AI cs.LG

    SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception

    Authors: Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg

    Abstract: Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling. This paper extends prior work on autonomously untangling long cables by introducing novel uncertainty quantification metrics and actions that interact with the cable to reduce perception uncertainty. We present Sliding and Gras** for Tangle Manipulation 2.0 (SGTM 2.0), a system that autonomously unta… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  34. Practical Black Box Hamiltonian Learning

    Authors: Andi Gu, Lukasz Cincio, Patrick J. Coles

    Abstract: We study the problem of learning the parameters for the Hamiltonian of a quantum many-body system, given limited access to the system. In this work, we build upon recent approaches to Hamiltonian learning via derivative estimation. We propose a protocol that improves the scaling dependence of prior works, particularly with respect to parameters relating to the structure of the Hamiltonian (e.g., i… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: 32 pages, 10 figures

    Report number: LA-UR-22-26209

    Journal ref: Nat Commun 15, 312 (2024)

  35. arXiv:2206.12037  [pdf, other

    cs.LG

    How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

    Authors: Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

    Abstract: Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4). A core component of S4 involves initializing the SSM state matrix to a particular matrix called a HiPPO matrix, which was empirically important for S4's ability to handle… ▽ More

    Submitted 5 August, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

  36. arXiv:2206.11893  [pdf, other

    cs.LG

    On the Parameterization and Initialization of Diagonal State Space Models

    Authors: Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré

    Abstract: State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpret… ▽ More

    Submitted 5 August, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

  37. arXiv:2206.03398  [pdf, other

    cs.LG cs.CV

    Towards a General Purpose CNN for Long Range Dependencies in $N$D

    Authors: David W. Romero, David M. Knigge, Albert Gu, Erik J. Bekkers, Efstratios Gavves, Jakub M. Tomczak, Mark Hoogendoorn

    Abstract: The use of Convolutional Neural Networks (CNNs) is widespread in Deep Learning due to a range of desirable model properties which result in an efficient and effective machine learning framework. However, performant CNN architectures must be tailored to specific tasks in order to incorporate considerations such as the input length, resolution, and dimentionality. In this work, we overcome the need… ▽ More

    Submitted 5 July, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: First two authors contributed equally to this work

  38. arXiv:2204.07054  [pdf, other

    q-bio.NC cs.LG cs.NE

    BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks

    Authors: Hejie Cui, Wei Dai, Yanqiao Zhu, Xuan Kan, Antonio Aodong Chen Gu, Joshua Lukemire, Liang Zhan, Lifang He, Ying Guo, Carl Yang

    Abstract: Map** the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their superior performance in many fields, there has not… ▽ More

    Submitted 28 November, 2022; v1 submitted 17 March, 2022; originally announced April 2022.

    Comments: IEEE Transactions on Medical Imaging

  39. arXiv:2203.14343  [pdf, other

    cs.LG cs.CL

    Diagonal State Spaces are as Effective as Structured State Spaces

    Authors: Ankit Gupta, Albert Gu, Jonathan Berant

    Abstract: Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICL… ▽ More

    Submitted 18 May, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: updated version with simpler DSS variants, RNN view for autoregressive decoding, ablation analysis, analysis of trained model parameters and kernels

  40. arXiv:2203.01924  [pdf, other

    cs.LG math.OC

    Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning

    Authors: Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng

    Abstract: We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary poi… ▽ More

    Submitted 7 March, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 43 pages, 3 figures, ICLR 2023 version

  41. arXiv:2202.09729  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    It's Raw! Audio Generation with State-Space Models

    Authors: Karan Goel, Albert Gu, Chris Donahue, Christopher Ré

    Abstract: Develo** architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SaShiMi, a new multi-s… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

    Comments: 23 pages, 7 figures, 7 tables

  42. arXiv:2202.01602  [pdf, other

    cs.LG cs.AI

    The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

    Authors: Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju

    Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questio… ▽ More

    Submitted 8 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  43. arXiv:2111.00396  [pdf, other

    cs.LG

    Efficiently Modeling Long Sequences with Structured State Spaces

    Authors: Albert Gu, Karan Goel, Christopher Ré

    Abstract: A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promis… ▽ More

    Submitted 5 August, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

    Comments: ICLR 2022 (Outstanding Paper HM)

  44. arXiv:2110.13985  [pdf, other

    cs.LG cs.AI

    Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

    Authors: Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  45. Adaptive shot allocation for fast convergence in variational quantum algorithms

    Authors: Andi Gu, Angus Lowe, Pavel A. Dub, Patrick J. Coles, Andrew Arrasmith

    Abstract: Variational Quantum Algorithms (VQAs) are a promising approach for practical applications like chemistry and materials science on near-term quantum computers as they typically reduce quantum resource requirements. However, in order to implement VQAs, an efficient classical optimization strategy is required. Here we present a new stochastic gradient descent method using an adaptive number of shots… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: 13 pages, 6 figures, 1 table

    Report number: LA-UR-21-28401

  46. arXiv:2106.03306  [pdf, other

    cs.LG

    HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

    Authors: Ines Chami, Albert Gu, Dat Nguyen, Christopher Ré

    Abstract: This paper studies Principal Component Analysis (PCA) for data lying in hyperbolic spaces. Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections. We generalize each of these concepts to t… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: ICML 2021

    Journal ref: PMLR 139:1419-1429, 2021

  47. arXiv:2106.02933  [pdf, other

    cs.LG

    k-Mixup Regularization for Deep Learning via Optimal Transport

    Authors: Kristjan Greenewald, Anming Gu, Mikhail Yurochkin, Justin Solomon, Edward Chien

    Abstract: Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to \emph{$k$-mixup}, which pertur… ▽ More

    Submitted 7 October, 2023; v1 submitted 5 June, 2021; originally announced June 2021.

  48. arXiv:2103.06710  [pdf, other

    cs.LG

    Deep Transfer Learning for Infectious Disease Case Detection Using Electronic Medical Records

    Authors: Ye Ye, Andrew Gu

    Abstract: During an infectious disease pandemic, it is critical to share electronic medical records or models (learned from these records) across regions. Applying one region's data/model to another region often have distribution shift issues that violate the assumptions of traditional machine learning techniques. Transfer learning can be a solution. To explore the potential of deep transfer learning algori… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

  49. arXiv:2102.05824  [pdf, other

    cs.LG cs.AI

    Reproducibility Report: La-MAML: Look-ahead Meta Learning for Continual Learning

    Authors: Joel Joseph, Alex Gu

    Abstract: The Continual Learning (CL) problem involves performing well on a sequence of tasks under limited compute. Current algorithms in the domain are either slow, offline or sensitive to hyper-parameters. La-MAML, an optimization-based meta-learning algorithm claims to be better than other replay-based, prior-based and meta-learning based approaches. According to the MER paper [1], metrics to measure pe… ▽ More

    Submitted 20 May, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

  50. arXiv:2102.01586  [pdf, other

    cs.CV cs.LG

    U-LanD: Uncertainty-Driven Video Landmark Detection

    Authors: Mohammad H. Jafari, Christina Luong, Michael Tsang, Ang Nan Gu, Nathan Van Woudenberg, Robert Rohling, Teresa Tsang, Purang Abolmaesumi

    Abstract: This paper presents U-LanD, a framework for joint detection of key frames and landmarks in videos. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. W… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.