Skip to main content

Showing 1–31 of 31 results for author: Arpit, D

.
  1. arXiv:2401.10495  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Causal Layering via Conditional Entropy

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle, when distributions are discrete. Our algorithms work by repeatedly removing sources or s… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  2. arXiv:2401.07526  [pdf, other

    cs.CL cs.AI cs.LG

    Editing Arbitrary Propositions in LLMs without Subject Labels

    Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

    Abstract: Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositi… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2308.05960  [pdf, other

    cs.AI

    BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

    Authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limi… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Preprint

  4. arXiv:2308.02151  [pdf, other

    cs.CL cs.AI

    Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

    Authors: Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely responding to queries from human users. Most existing language agents, however, are not optimized using environment-specific rewards. Although some agents ena… ▽ More

    Submitted 5 May, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

  5. arXiv:2307.08962  [pdf, other

    cs.AI cs.LG

    REX: Rapid Exploration and eXploitation for AI Agents

    Authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

    Abstract: In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer… ▽ More

    Submitted 26 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  6. arXiv:2303.05628  [pdf, other

    cs.LG cs.AI stat.ME

    On the Unlikelihood of D-Separation

    Authors: Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Devansh Arpit

    Abstract: Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist, unless the graph is extremely sparse. We then provide an analytic average ca… ▽ More

    Submitted 3 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  7. arXiv:2301.10859  [pdf, other

    cs.LG cs.AI

    Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data

    Authors: Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, Juan Carlos Niebles

    Abstract: We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data. It supports causal discovery and causal inference for tabular and time series data, of discrete, continuous and heterogeneous types. This library includes algorithms that handle linear and non-linear causal relationships between variables, and uses multi-processing for speed-up. We al… ▽ More

    Submitted 22 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  8. arXiv:2110.10832  [pdf, other

    cs.LG cs.CV

    Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

    Authors: Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

    Abstract: In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization traje… ▽ More

    Submitted 10 October, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2022

  9. arXiv:2110.10303  [pdf, other

    cs.CV cs.LG

    Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

    Authors: Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

    Abstract: Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been s… ▽ More

    Submitted 15 February, 2023; v1 submitted 19 October, 2021; originally announced October 2021.

  10. arXiv:2110.10293  [pdf, other

    cs.LG cs.CV

    Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

    Authors: Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong

    Abstract: Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains. Meanwhile, model ensembling is one of the most universally applicable techniques in supervised learning literature and practice, offering a simple solution to reliably… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  11. arXiv:2109.09265  [pdf, other

    cs.LG cs.MS stat.ML

    Merlion: A Machine Learning Library for Time Series

    Authors: Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang

    Abstract: We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve in… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

    Comments: 22 pages, 1 figure, 14 tables

  12. arXiv:2012.14193  [pdf, other

    cs.LG stat.ML

    Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

    Authors: Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomen… ▽ More

    Submitted 11 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

  13. arXiv:2002.09572  [pdf, other

    cs.LG stat.ML

    The Break-Even Point on Optimization Trajectories of Deep Neural Networks

    Authors: Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gr… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted as a spotlight at ICLR 2020. The last two authors contributed equally

  14. arXiv:2002.09046  [pdf, other

    stat.ML cs.LG

    Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

    Authors: Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

    Abstract: We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Specifically, given an observed random variable $\mathbf{x}$ and a latent discrete variable $z$, we can express $p(\mathbf{x}|z)$, $p(z|\mathbf{x})$ and $p(z)$ in… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

  15. arXiv:1910.00164  [pdf, other

    stat.ML cs.LG

    Predicting with High Correlation Features

    Authors: Devansh Arpit, Caiming Xiong, Richard Socher

    Abstract: It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art performance on such test sets, they achieve poor generalization on out of distribution (OOD) samples where the IID (independent, identical distribution) assumptio… ▽ More

    Submitted 16 November, 2019; v1 submitted 30 September, 2019; originally announced October 2019.

  16. arXiv:1906.02341  [pdf, other

    stat.ML cs.LG

    How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

    Authors: Devansh Arpit, Victor Campos, Yoshua Bengio

    Abstract: Residual networks (ResNet) and weight normalization play an important role in various deep learning applications. However, parameter initialization strategies have not been studied previously for weight normalized networks and, in practice, initialization methods designed for un-normalized networks are used as a proxy. Similarly, initialization for ResNets have also been studied for un-normalized… ▽ More

    Submitted 30 October, 2019; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: First two authors have equal contribution

  17. arXiv:1901.03611  [pdf, other

    stat.ML cs.LG

    The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

    Authors: Devansh Arpit, Yoshua Bengio

    Abstract: It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be several factors involved behind this, we prove some desirable theoretical properties at initialization which may be enjoyed by ReLU networks. Specifically, it is known that He initialization in deep ReLU networks asymptotically preserves variance of activations… ▽ More

    Submitted 2 October, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

  18. arXiv:1810.03023  [pdf, other

    stat.ML cs.LG

    h-detach: Modifying the LSTM Gradient Towards Better Optimization

    Authors: Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

    Abstract: Recurrent neural networks are known for their notorious exploding and vanishing gradient problem (EVGP). This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps. We introduce a simple stochastic algorithm (\texti… ▽ More

    Submitted 9 January, 2019; v1 submitted 6 October, 2018; originally announced October 2018.

    Comments: First two authors contributed equally. Published in ICLR 2019

  19. arXiv:1806.08734  [pdf, other

    stat.ML cs.LG

    On the Spectral Bias of Neural Networks

    Authors: Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

    Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output map**s with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuatio… ▽ More

    Submitted 31 May, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: 23 pages

    Journal ref: ICML 2019

  20. arXiv:1802.08770  [pdf, other

    stat.ML cs.LG

    A Walk with SGD

    Authors: Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

    Abstract: We present novel empirical observations regarding how stochastic gradient descent (SGD) navigates the loss landscape of over-parametrized deep neural networks (DNNs). These observations expose the qualitatively different roles of learning rate and batch-size in DNN optimization and generalization. Specifically we study the DNN loss surface along the trajectory of SGD by interpolating the loss surf… ▽ More

    Submitted 29 May, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: First two authors contributed equally

  21. arXiv:1711.05717  [pdf, other

    stat.ML cs.LG

    Variational Bi-LSTMs

    Authors: Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

    Abstract: Recurrent neural networks like long short-term memory (LSTM) are important architectures for sequential prediction tasks. LSTMs (and RNNs in general) model sequences along the forward time direction. Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer repres… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

  22. arXiv:1711.04623  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Three Factors Influencing Minima in SGD

    Authors: Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic… ▽ More

    Submitted 13 September, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: First two authors contributed equally. Short version accepted into ICLR workshop. Accepted to Artificial Neural Networks and Machine Learning, ICANN 2018

  23. arXiv:1711.00066  [pdf, other

    stat.ML cs.AI cs.LG

    Fraternal Dropout

    Authors: Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

    Abstract: Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes ad… ▽ More

    Submitted 28 March, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Accepted to ICLR 2018. Extended appendix. Added official GitHub code for replication: https://github.com/kondiz/fraternal-dropout . Added references. Corrected typos

  24. arXiv:1710.04773  [pdf, other

    cs.CV

    Residual Connections Encourage Iterative Inference

    Authors: Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

    Abstract: Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of ite… ▽ More

    Submitted 8 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: First two authors contributed equally. Published in ICLR 2018

  25. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  26. arXiv:1605.07145  [pdf, other

    stat.ML cs.LG cs.NE

    On Optimality Conditions for Auto-Encoder Signal Recovery

    Authors: Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju

    Abstract: Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing and sparse coding assume a data generating process, where the observed data is generated from some true latent signal source, and try to recover the correspondi… ▽ More

    Submitted 13 July, 2017; v1 submitted 23 May, 2016; originally announced May 2016.

  27. arXiv:1603.01431  [pdf, other

    stat.ML cs.LG

    Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

    Authors: Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju

    Abstract: While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks. Specifically, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate… ▽ More

    Submitted 12 July, 2016; v1 submitted 4 March, 2016; originally announced March 2016.

    Comments: 11 pages, ICML 2016, appendix added to the last version

  28. arXiv:1505.05561  [pdf, other

    stat.ML cs.CV cs.LG

    Why Regularized Auto-Encoders learn Sparse Representation?

    Authors: Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju

    Abstract: While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers in… ▽ More

    Submitted 17 June, 2016; v1 submitted 20 May, 2015; originally announced May 2015.

    Comments: 8 pages of content, 1 page of reference, 4 pages of supplementary. ICML 2016; bug fix in lemma 1

  29. arXiv:1412.2404  [pdf, other

    cs.LG stat.ML

    Dimensionality Reduction with Subspace Structure Preservation

    Authors: Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

    Abstract: Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications. However, dimensionality reduction approaches that theoretically preserve this independence assumption have not been well studied. Our key contribution is to show that $2K$ projection vectors are sufficient for the independence preservation of any $K$ class data sampl… ▽ More

    Submitted 6 April, 2016; v1 submitted 7 December, 2014; originally announced December 2014.

    Comments: Published in NIPS 2014; v2: minor updates to the algorithm and added a few lines addressing application to large-scale/high-dimensional data

  30. arXiv:1405.1380  [pdf, other

    stat.ML cs.LG cs.NE

    Is Joint Training Better for Deep Auto-Encoders?

    Authors: Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

    Abstract: Traditionally, when generative models of data are developed via deep architectures, greedy layer-wise pre-training is employed. In a well-trained model, the lower layer of the architecture models the data distribution conditional upon the hidden variables, while the higher layers model the hidden distribution prior. But due to the greedy scheme of the layerwise training technique, the parameters o… ▽ More

    Submitted 15 June, 2015; v1 submitted 6 May, 2014; originally announced May 2014.

    Comments: 11 pages, 4 figures

  31. arXiv:1401.4489  [pdf, other

    cs.CV cs.LG stat.ML

    An Analysis of Random Projections in Cancelable Biometrics

    Authors: Devansh Arpit, Ifeoma Nwogu, Gaurav Srivastava, Venu Govindaraju

    Abstract: With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise. Because the problem of cancelable template generation deals with the trade-off between template security and matching performance, many state-of-the-art algorithms successful in generating high quality cancelable bio… ▽ More

    Submitted 13 November, 2014; v1 submitted 17 January, 2014; originally announced January 2014.