Search | arXiv e-print repository

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Authors: Xu Liu, Jiuzhou Lei, Ankit Prabhu, Yuezhan Tao, Igor Spasojevic, Pratik Chaudhari, Nikolay Atanasov, Vijay Kumar

Abstract: This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Map** (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment… ▽ More This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Map** (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment, including high-level sparse semantic maps of object models and low-level voxel maps. We leverage the informativeness and viewpoint invariance of the high-level semantic map to obtain an effective semantics-driven place-recognition algorithm for inter-robot loop closure detection across aerial and ground robots with different sensing modalities. A communication module is designed to track each robot's own observations and those of other robots whenever communication links are available. Such observations are then used to construct a merged map. Our framework enables real-time decentralized operations onboard robots, allowing them to opportunistically leverage communication. We integrate and deploy our proposed framework on three types of aerial and ground robots. Extensive experimental results show an average inter-robot localization error of approximately 20 cm in position and 0.2 degrees in orientation, an object map** F1 score consistently over 0.9, and a communication packet size of merely 2-3 megabytes per kilometer trajectory with as many as 1,000 landmarks. The project website can be found at https://xurobotics.github.io/slideslam/. △ Less

Submitted 2 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Preliminary release

arXiv:2406.09631 [pdf, other]

Optimal Convex Cover as Collision-free Space Approximation for Trajectory Generation

Authors: Yuwei Wu, Igor Spasojevic, Pratik Chaudhari, Vijay Kumar

Abstract: We propose an online iterative algorithm to find a suitable convex cover to under-approximate the free space for autonomous navigation to delineate Safe Flight Corridors (SFC). The convex cover consists of a set of polytopes such that the union of the polytopes represents obstacle-free space, allowing us to find trajectories for robots that lie within the convex cover. In order to find the SFC tha… ▽ More We propose an online iterative algorithm to find a suitable convex cover to under-approximate the free space for autonomous navigation to delineate Safe Flight Corridors (SFC). The convex cover consists of a set of polytopes such that the union of the polytopes represents obstacle-free space, allowing us to find trajectories for robots that lie within the convex cover. In order to find the SFC that facilitates optimal trajectory generation, we iteratively find overlap** polytopes of maximum volumes that include specified waypoints initialized by a geometric or kinematic planner. Constraints at waypoints appear in two alternating stages of a joint optimization problem, which is solved by a method inspired by the Alternating Direction Method of Multipliers (ADMM) with partially distributed variables. We validate the effectiveness of our proposed algorithm using a range of parameterized environments and show its applications for two-stage motion planning. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07431 [pdf, other]

Active Scout: Multi-Target Tracking Using Neural Radiance Fields in Dense Urban Environments

Authors: Christopher D. Hsu, Pratik Chaudhari

Abstract: We study pursuit-evasion games in highly occluded urban environments, e.g. tall buildings in a city, where a scout (quadrotor) tracks multiple dynamic targets on the ground. We show that we can build a neural radiance field (NeRF) representation of the city -- online -- using RGB and depth images from different vantage points. This representation is used to calculate the information gain to both e… ▽ More We study pursuit-evasion games in highly occluded urban environments, e.g. tall buildings in a city, where a scout (quadrotor) tracks multiple dynamic targets on the ground. We show that we can build a neural radiance field (NeRF) representation of the city -- online -- using RGB and depth images from different vantage points. This representation is used to calculate the information gain to both explore unknown parts of the city and track the targets -- thereby giving a completely first-principles approach to actively tracking dynamic targets. We demonstrate, using a custom-built simulator using Open Street Maps data of Philadelphia and New York City, that we can explore and locate 20 stationary targets within 300 steps. This is slower than a greedy baseline which which does not use active perception. But for dynamic targets that actively hide behind occlusions, we show that our approach maintains, at worst, a tracking error of 200m; the greedy baseline can have a tracking error as large as 600m. We observe a number of interesting properties in the scout's policies, e.g., it switches its attention to track a different target periodically, as the quality of the NeRF representation improves over time, the scout also becomes better in terms of target tracking. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 8 pages, 8 figures, 1 table

arXiv:2406.07361 [pdf, other]

Deep Implicit Optimization for Robust and Flexible Image Registration

Authors: Rohit Jena, Pratik Chaudhari, James C. Gee

Abstract: Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, DLIR methods forego many of the benefits of classical optimization-based methods. The functional nature of deep networks do not guarantee that the predicted transformation is a local minima of the… ▽ More Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, DLIR methods forego many of the benefits of classical optimization-based methods. The functional nature of deep networks do not guarantee that the predicted transformation is a local minima of the registration objective, the representation of the transformation (displacement/velocity field/affine) is fixed, and the networks are not robust to domain shift. Our method aims to bridge this gap between classical and learning methods by incorporating optimization as a layer in a deep network. A deep network is trained to predict multi-scale dense feature images that are registered using a black box iterative optimization solver. This optimal warp is then used to minimize image and label alignment errors. By implicitly differentiating end-to-end through an iterative optimization solver, our learned features are registration and label-aware, and the warp functions are guaranteed to be local minima of the registration objective in the feature space. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift such as anisotropy and varying intensity profiles. For the first time, our method allows switching between arbitrary transformation representations (free-form to diffeomorphic) at test time with zero retraining. End-to-end feature learning also facilitates interpretability of features, and out-of-the-box promptability using additional label-fidelity terms at inference. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.14061 [pdf, other]

Meanings and Feelings of Large Language Models: Observability of Latent States in Generative AI

Authors: Tian Yu Liu, Stefano Soatto, Matteo Marchi, Pratik Chaudhari, Paulo Tabuada

Abstract: We tackle the question of whether Large Language Models (LLMs), viewed as dynamical systems with state evolving in the embedding space of symbolic tokens, are observable. That is, whether there exist multiple 'mental' state trajectories that yield the same sequence of generated tokens, or sequences that belong to the same Nerode equivalence class ('meaning'). If not observable, mental state trajec… ▽ More We tackle the question of whether Large Language Models (LLMs), viewed as dynamical systems with state evolving in the embedding space of symbolic tokens, are observable. That is, whether there exist multiple 'mental' state trajectories that yield the same sequence of generated tokens, or sequences that belong to the same Nerode equivalence class ('meaning'). If not observable, mental state trajectories ('experiences') evoked by an input ('perception') or by feedback from the model's own state ('thoughts') could remain self-contained and evolve unbeknown to the user while being potentially accessible to the model provider. Such "self-contained experiences evoked by perception or thought" are akin to what the American Psychological Association (APA) defines as 'feelings'. Beyond the lexical curiosity, we show that current LLMs implemented by autoregressive Transformers cannot have 'feelings' according to this definition: The set of state trajectories indistinguishable from the tokenized output is a singleton. But if there are 'system prompts' not visible to the user, then the set of indistinguishable trajectories becomes non-trivial, and there can be multiple state trajectories that yield the same verbalized output. We prove these claims analytically, and show examples of modifications to standard LLMs that engender such 'feelings.' Our analysis sheds light on possible designs that would enable a model to perform non-trivial computation that is not visible to the user, as well as on controls that the provider of services using the model could take to prevent unintended behavior. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.09717 [pdf, other]

From NeRFs to Gaussian Splats, and Back

Authors: Siming He, Zach Osman, Pratik Chaudhari

Abstract: For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth betwee… ▽ More For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth between the two. Our approach achieves the best of both NeRFs (superior PSNR, SSIM, and LPIPS on dissimilar views, and a compact representation) and GS (real-time rendering and ability for easily modifying the representation); the computational cost of these conversions is minor compared to training the two from scratch. △ Less

Submitted 10 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.02325 [pdf, ps, other]

Heat Death of Generative Models in Closed-Loop Learning

Authors: Matteo Marchi, Stefano Soatto, Pratik Chaudhari, Paulo Tabuada

Abstract: Improvement and adoption of generative machine learning models is rapidly accelerating, as exemplified by the popularity of LLMs (Large Language Models) for text, and diffusion models for image generation.As generative models become widespread, data they generate is incorporated into shared content through the public web. This opens the question of what happens when data generated by a model is fe… ▽ More Improvement and adoption of generative machine learning models is rapidly accelerating, as exemplified by the popularity of LLMs (Large Language Models) for text, and diffusion models for image generation.As generative models become widespread, data they generate is incorporated into shared content through the public web. This opens the question of what happens when data generated by a model is fed back to the model in subsequent training campaigns. This is a question about the stability of the training process, whether the distribution of publicly accessible content, which we refer to as "knowledge", remains stable or collapses. Small scale empirical experiments reported in the literature show that this closed-loop training process is prone to degenerating. Models may start producing gibberish data, or sample from only a small subset of the desired data distribution (a phenomenon referred to as mode collapse). So far there has been only limited theoretical understanding of this process, in part due to the complexity of the deep networks underlying these generative models. The aim of this paper is to provide insights into this process (that we refer to as "generative closed-loop learning") by studying the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset. The sampling of many of these models can be controlled via a "temperature" parameter. Using dynamical systems tools, we show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to asymptotically degenerate. In fact, either the generative distribution collapses to a small set of outputs, or becomes uniform over a large set of outputs. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01249 [pdf, other]

FireANTs: Adaptive Riemannian Optimization for Multi-Scale Diffeomorphic Registration

Authors: Rohit Jena, Pratik Chaudhari, James C. Gee

Abstract: Diffeomorphic Image Registration is a critical part of the analysis in various imaging modalities and downstream tasks like image translation, segmentation, and atlas building. Registration algorithms based on optimization have stood the test of time in terms of accuracy, reliability, and robustness across a wide spectrum of modalities and acquisition settings. However, these algorithms converge s… ▽ More Diffeomorphic Image Registration is a critical part of the analysis in various imaging modalities and downstream tasks like image translation, segmentation, and atlas building. Registration algorithms based on optimization have stood the test of time in terms of accuracy, reliability, and robustness across a wide spectrum of modalities and acquisition settings. However, these algorithms converge slowly, are prohibitively expensive to run, and their usage requires a steep learning curve, limiting their scalability to larger clinical and scientific studies. In this paper, we develop multi-scale Adaptive Riemannian Optimization algorithms for diffeomorphic image registration. We demonstrate compelling improvements on image registration across a spectrum of modalities and anatomies by measuring structural and landmark overlap of the registered image volumes. Our proposed framework leads to a consistent improvement in performance, and from 300x up to 2000x speedup over existing algorithms. Our modular library design makes it easy to use and allows customization via user-defined cost functions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00769 [pdf, other]

An Active Perception Game for Robust Autonomous Exploration

Authors: Siming He, Yuezhan Tao, Igor Spasojevic, Vijay Kumar, Pratik Chaudhari

Abstract: We formulate active perception for an autonomous agent that explores an unknown environment as a two-player zero-sum game: the agent aims to maximize information gained from the environment while the environment aims to minimize the information gained by the agent. In each episode, the environment reveals a set of actions with their potentially erroneous information gain. In order to select the be… ▽ More We formulate active perception for an autonomous agent that explores an unknown environment as a two-player zero-sum game: the agent aims to maximize information gained from the environment while the environment aims to minimize the information gained by the agent. In each episode, the environment reveals a set of actions with their potentially erroneous information gain. In order to select the best action, the robot needs to recover the true information gain from the erroneous one. The robot does so by minimizing the discrepancy between its estimate of information gain and the true information gain it observes after taking the action. We propose an online convex optimization algorithm that achieves sub-linear expected regret $O(T^{3/4})$ for estimating the information gain. We also provide a bound on the regret of active perception performed by any (near-)optimal prediction and trajectory selection algorithms. We evaluate this approach using semantic neural radiance fields (NeRFs) in simulated realistic 3D environments to show that the robot can discover up to 12% more objects using the improved estimate of the information gain. On the M3ED dataset, the proposed algorithm reduced the error of information gain prediction in occupancy map by over 67%. In real-world experiments using occupancy maps on a Jackal ground robot, we show that this approach can calculate complicated trajectories that efficiently explore all occluded regions. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2312.02521 [pdf, other]

Retrieving Conditions from Reference Images for Diffusion Models

Authors: Haoran Tang, Xin Zhou, Jieren Deng, Zhihong Pan, Hao Tian, Pratik Chaudhari

Abstract: Newly developed diffusion-based techniques have showcased phenomenal abilities in producing a wide range of high-quality images, sparking considerable interest in various applications. A prevalent scenario is to generate new images based on a subject from reference images. This subject could be face identity for styled avatars, body and clothing for virtual try-on and so on. Satisfying this requir… ▽ More Newly developed diffusion-based techniques have showcased phenomenal abilities in producing a wide range of high-quality images, sparking considerable interest in various applications. A prevalent scenario is to generate new images based on a subject from reference images. This subject could be face identity for styled avatars, body and clothing for virtual try-on and so on. Satisfying this requirement is evolving into a field called Subject-Driven Generation. In this paper, we consider Subject-Driven Generation as a unified retrieval problem with diffusion models. We introduce a novel diffusion model architecture, named RetriNet, designed to address and solve these problems by retrieving subject attributes from reference images precisely, and filter out irrelevant information. RetriNet demonstrates impressive performance when compared to existing state-of-the-art approaches in face generation. We further propose a research and iteration friendly dataset, RetriBooru, to study a more difficult problem, concept composition. Finally, to better evaluate alignment between similarity and diversity or measure diversity that have been previously unaccounted for, we introduce a novel class of metrics named Similarity Weighted Diversity (SWD). △ Less

Submitted 15 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.10812 [pdf, other]

SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos

Authors: Rohit Jena, Ganesh Subramanian Iyer, Siddharth Choudhary, Brandon Smith, Pratik Chaudhari, James Gee

Abstract: We propose SplatArmor, a novel approach for recovering detailed and animatable human models by `armoring' a parameterized body model with 3D Gaussians. Our approach represents the human as a set of 3D Gaussians within a canonical space, whose articulation is defined by extending the skinning of the underlying SMPL geometry to arbitrary locations in the canonical space. To account for pose-dependen… ▽ More We propose SplatArmor, a novel approach for recovering detailed and animatable human models by `armoring' a parameterized body model with 3D Gaussians. Our approach represents the human as a set of 3D Gaussians within a canonical space, whose articulation is defined by extending the skinning of the underlying SMPL geometry to arbitrary locations in the canonical space. To account for pose-dependent effects, we introduce a SE(3) field, which allows us to capture both the location and anisotropy of the Gaussians. Furthermore, we propose the use of a neural color field to provide color regularization and 3D supervision for the precise positioning of these Gaussians. We show that Gaussian splatting provides an interesting alternative to neural rendering based methods by leverging a rasterization primitive without facing any of the non-differentiability and optimization challenges typically faced in such approaches. The rasterization paradigms allows us to leverage forward skinning, and does not suffer from the ambiguities associated with inverse skinning and war**. We show compelling results on the ZJU MoCap and People Snapshot datasets, which underscore the effectiveness of our method for controllable human synthesis. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.09892 [pdf, other]

Active Perception using Neural Radiance Fields

Authors: Siming He, Christopher D. Hsu, Dexter Ong, Yifei Simon Shao, Pratik Chaudhari

Abstract: We study active perception from first principles to argue that an autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones. Doing so requires (a) a representation of the scene that summarizes past observations and the ability to update this representation to incorporate new observations (state estimation and map**), (b)… ▽ More We study active perception from first principles to argue that an autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones. Doing so requires (a) a representation of the scene that summarizes past observations and the ability to update this representation to incorporate new observations (state estimation and map**), (b) the ability to synthesize new observations of the scene (a generative model), and (c) the ability to select control trajectories that maximize predictive information (planning). This motivates a neural radiance field (NeRF)-like representation which captures photometric, geometric and semantic properties of the scene grounded. This representation is well-suited to synthesizing new observations from different viewpoints. And thereby, a sampling-based planner can be used to calculate the predictive information from synthetic observations along dynamically-feasible trajectories. We use active perception for exploring cluttered indoor environments and employ a notion of semantic uncertainty to check for the successful completion of an exploration task. We demonstrate these ideas via simulation in realistic 3D indoor environments. △ Less

Submitted 30 March, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Report number: Proc. of the American Control Conference (ACC) 2024

arXiv:2310.02162 [pdf, other]

TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Map** of Trees in Forests and Orchards

Authors: Derek Cheng, Fernando Cladera Ojeda, Ankit Prabhu, Xu Liu, Alan Zhu, Patrick Corey Green, Reza Ehsani, Pratik Chaudhari, Vijay Kumar

Abstract: Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics data… ▽ More Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics dataset for precision agriculture and forestry addressing the counting and map** of trees in forestry and orchards. TreeScope provides LiDAR data from agricultural environments collected with robotics platforms, such as UAV and mobile robot platforms carried by vehicles and human operators. In the first release of this dataset, we provide ground-truth data with over 1,800 manually annotated semantic labels for tree stems and field-measured tree diameters. We share benchmark scripts for these tasks that researchers may use to evaluate the accuracy of their algorithms. Finally, we run our open-source diameter estimation and off-the-shelf semantic segmentation algorithms and share our baseline results. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Submitted to 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) for review

arXiv:2309.14254 [pdf, other]

End-to-end deep learning inference with CMSSW via ONNX using docker

Authors: Purva Chaudhari, Shravan Chaudhari, Ruchi Chudasama, Sergei Gleyzer

Abstract: Deep learning techniques have been proven to provide excellent performance for a variety of high-energy physics applications, such as particle identification, event reconstruction and trigger operations. Recently, we developed an end-to-end deep learning approach to identify various particles using low-level detector information from high-energy collisions. These models will be incorporated in the… ▽ More Deep learning techniques have been proven to provide excellent performance for a variety of high-energy physics applications, such as particle identification, event reconstruction and trigger operations. Recently, we developed an end-to-end deep learning approach to identify various particles using low-level detector information from high-energy collisions. These models will be incorporated in the CMS software framework (CMSSW) to enable their use for particle reconstruction or for trigger operation in real-time. Incorporating these computational tools in the experimental framework presents new challenges. This paper reports an implementation of the end-to-end deep learning inference with the CMS software framework. The inference has been implemented on GPU for faster computation using ONNX. We have benchmarked the ONNX inference with GPU and CPU using NERSCs Perlmutter cluster by building a docker image of the CMS software framework. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 9 pages, 7 figures, CHEP2023 proceedings, submitted to EPJ Web of Conferences

Report number: CMS CR-2023/161

arXiv:2309.13720 [pdf, other]

Design and Evaluation of Motion Planners for Quadrotors in Environments with Varying Complexities

Authors: Yifei Simon Shao, Yuwei Wu, Laura Jarin-Lipschitz, Pratik Chaudhari, Vijay Kumar

Abstract: Motion planning techniques for quadrotors have advanced significantly over the past decade. Most successful planners have two stages: a front-end that determines a path that incorporates geometric (or kinematic or input) constraints and specifies the homotopy class of the trajectory, and a back-end that optimizes this path to respect dynamics and input constraints. While there are many different c… ▽ More Motion planning techniques for quadrotors have advanced significantly over the past decade. Most successful planners have two stages: a front-end that determines a path that incorporates geometric (or kinematic or input) constraints and specifies the homotopy class of the trajectory, and a back-end that optimizes this path to respect dynamics and input constraints. While there are many different choices for each stage, the eventual performance depends critically not only on these choices, but also on the environment. Given a new environment, it is difficult to decide a priori how one should design a motion planner. In this work, we develop (i) a procedure to construct parametrized environments, (ii) metrics that characterize the difficulty of motion planning in these environments, and (iii) an open-source software stack that can be used to combine a wide variety of two-stage planners seamlessly. We perform experiments in simulations and a real platform. We find, somewhat conveniently, that geometric front-ends are sufficient for environments with varying complexities if combined with dynamics-aware backends. The metrics we designed faithfully capture the planning difficulty in a given environment. All code is available at https://github.com/KumarRobotics/kr_mp_design △ Less

Submitted 7 March, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.09165 [pdf]

Analog Content-Addressable Memory from Complementary FeFETs

Authors: Xiwen Liu, Keshava Katti, Yunfei He, Paul Jacob, Claudia Richter, Uwe Schroeder, Santosh Kurinec, Pratik Chaudhari, Deep Jariwala

Abstract: To address the increasing computational demands of artificial intelligence (AI) and big data, compute-in-memory (CIM) integrates memory and processing units into the same physical location, reducing the time and energy overhead of the system. Despite advancements in non-volatile memory (NVM) for matrix multiplication, other critical data-intensive operations, like parallel search, have been overlo… ▽ More To address the increasing computational demands of artificial intelligence (AI) and big data, compute-in-memory (CIM) integrates memory and processing units into the same physical location, reducing the time and energy overhead of the system. Despite advancements in non-volatile memory (NVM) for matrix multiplication, other critical data-intensive operations, like parallel search, have been overlooked. Current parallel search architectures, namely content-addressable memory (CAM), often use binary, which restricts density and functionality. We present an analog CAM (ACAM) cell, built on two complementary ferroelectric field-effect transistors (FeFETs), that performs parallel search in the analog domain with over 40 distinct match windows. We then deploy it to calculate similarity between vectors, a building block in the following two machine learning problems. ACAM outperforms ternary CAM (TCAM) when applied to similarity search for few-shot learning on the Omniglot dataset, yielding projected simulation results with improved inference accuracy by 5%, 3x denser memory architecture, and more than 100x faster speed compared to central processing unit (CPU) and graphics processing unit (GPU) per similarity search on scaled CMOS nodes. We also demonstrate 1-step inference on a kernel regression model by combining non-linear kernel computation and matrix multiplication in ACAM, with simulation estimates indicating 1,000x faster inference than CPU and GPU. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2308.03175 [pdf, other]

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

Authors: Rongguang Wang, Guray Erus, Pratik Chaudhari, Christos Davatzikos

Abstract: Machine learning (ML) has shown great promise for revolutionizing a number of areas, including healthcare. However, it is also facing a reproducibility crisis, especially in medicine. ML models that are carefully constructed from and evaluated on a training set might not generalize well on data from different patient populations or acquisition instrument settings and protocols. We tackle this prob… ▽ More Machine learning (ML) has shown great promise for revolutionizing a number of areas, including healthcare. However, it is also facing a reproducibility crisis, especially in medicine. ML models that are carefully constructed from and evaluated on a training set might not generalize well on data from different patient populations or acquisition instrument settings and protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.06328 [pdf, other]

Budgeting Counterfactual for Offline RL

Authors: Yao Liu, Pratik Chaudhari, Rasool Fakoor

Abstract: The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowle… ▽ More The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of out-of-distribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking out-of-distribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our $Q$ updating rules. Empirically, we show that the overall performance of our method is better than the state-of-the-art offline RL methods on tasks in the widely-used D4RL benchmarks. △ Less

Submitted 21 May, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Published at NeurIPS 2023

arXiv:2305.18449 [pdf, other]

Taming AI Bots: Controllability of Neural States in Large Language Models

Authors: Stefano Soatto, Paulo Tabuada, Pratik Chaudhari, Tian Yu Liu

Abstract: We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of ``meaning'' that is amenable to analysis. Then, we characterize ``meaningful data'' on which large language models (LLMs) are ostensibly trained, and ``well-trained LLMs'' through conditions that are largely met by today's LLMs. While… ▽ More We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of ``meaning'' that is amenable to analysis. Then, we characterize ``meaningful data'' on which large language models (LLMs) are ostensibly trained, and ``well-trained LLMs'' through conditions that are largely met by today's LLMs. While a well-trained LLM constructs an embedding space of meanings that is Euclidean, meanings themselves do not form a vector (linear) subspace, but rather a quotient space within. We then characterize the subset of meanings that can be reached by the state of the LLMs for some input prompt, and show that a well-trained bot can reach any meaning albeit with small probability. We then introduce a stronger notion of controllability as {\em almost certain reachability}, and show that, when restricted to the space of meanings, an AI bot is controllable. We do so after introducing a functional characterization of attentive AI bots, and finally derive necessary and sufficient conditions for controllability. The fact that AI bots are controllable means that an adversary could steer them towards any state. However, the sampling process can be designed to counteract adverse actions and avoid reaching undesirable regions of state space before their boundary is crossed. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: TLDR: AI Bots are stochastic dynamical systems whose mental state can be controlled by both the user and the designer. The space of meanings, defined as equivalence classes of sentences, is learned during fine-tuning with human supervision, and safeguarding can be designed into the bot by establishing controls both at its input and output

arXiv:2305.17332 [pdf, other]

Learning Capacity: A Measure of the Effective Dimensionality of a Model

Authors: Daiwei Chen, Weikai Chang, Pratik Chaudhari

Abstract: We exploit a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to define a "learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, depends upon the number… ▽ More We exploit a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to define a "learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, depends upon the number of samples used for training, and is numerically consistent with notions of capacity obtained from the PAC-Bayesian framework. The test error as a function of the learning capacity does not exhibit double descent. We show that the learning capacity of a model saturates at very small and very large sample sizes; this provides guidelines, as to whether one should procure more data or whether one should search for new architectures, to improve performance. We show how the learning capacity can be used to understand the effective dimensionality, even for non-parametric models such as random forests and $k$-nearest neighbor classifiers. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.01604 [pdf, other]

doi 10.1073/pnas.2310002121

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

Authors: Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari

Abstract: We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization technique… ▽ More We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold. △ Less

Submitted 19 March, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

Journal ref: Proceedings of the National Academy of Sciences 121.12 (2024)

arXiv:2304.11446 [pdf, other]

Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis

Authors: Yansong Gao, Zhihong Pan, Xin Zhou, Le Kang, Pratik Chaudhari

Abstract: Denoising diffusion probabilistic models (DDPMs) are a class of powerful generative models. The past few years have witnessed the great success of DDPMs in generating high-fidelity samples. A significant limitation of the DDPMs is the slow sampling procedure. DDPMs generally need hundreds or thousands of sequential function evaluations (steps) of neural networks to generate a sample. This paper ai… ▽ More Denoising diffusion probabilistic models (DDPMs) are a class of powerful generative models. The past few years have witnessed the great success of DDPMs in generating high-fidelity samples. A significant limitation of the DDPMs is the slow sampling procedure. DDPMs generally need hundreds or thousands of sequential function evaluations (steps) of neural networks to generate a sample. This paper aims to develop a fast sampling method for DDPMs requiring much fewer steps while retaining high sample quality. The inference process of DDPMs approximates solving the corresponding diffusion ordinary differential equations (diffusion ODEs) in the continuous limit. This work analyzes how the backward error affects the diffusion ODEs and the sample quality in DDPMs. We propose fast sampling through the \textbf{Restricting Backward Error schedule (RBE schedule)} based on dynamically moderating the long-time backward error. Our method accelerates DDPMs without any further training. Our experiments show that sampling with an RBE schedule generates high-quality samples within only 8 to 20 function evaluations on various benchmark datasets. We achieved 12.01 FID in 8 function evaluations on the ImageNet $128\times128$, and a $20\times$ speedup compared with previous baseline samplers. △ Less

Submitted 22 April, 2023; originally announced April 2023.

Comments: arXiv admin note: text overlap with arXiv:2101.12176 by other authors

arXiv:2303.08808 [pdf, other]

Mesh Strikes Back: Fast and Efficient Human Reconstruction from RGB videos

Authors: Rohit Jena, Pratik Chaudhari, James Gee, Ganesh Iyer, Siddharth Choudhary, Brandon M. Smith

Abstract: Human reconstruction and synthesis from monocular RGB videos is a challenging problem due to clothing, occlusion, texture discontinuities and sharpness, and framespecific pose changes. Many methods employ deferred rendering, NeRFs and implicit methods to represent clothed humans, on the premise that mesh-based representations cannot capture complex clothing and textures from RGB, silhouettes, and… ▽ More Human reconstruction and synthesis from monocular RGB videos is a challenging problem due to clothing, occlusion, texture discontinuities and sharpness, and framespecific pose changes. Many methods employ deferred rendering, NeRFs and implicit methods to represent clothed humans, on the premise that mesh-based representations cannot capture complex clothing and textures from RGB, silhouettes, and keypoints alone. We provide a counter viewpoint to this fundamental premise by optimizing a SMPL+D mesh and an efficient, multi-resolution texture representation using only RGB images, binary silhouettes and sparse 2D keypoints. Experimental results demonstrate that our approach is more capable of capturing geometric details compared to visual hull, mesh-based methods. We show competitive novel view synthesis and improvements in novel pose synthesis compared to NeRF-based methods, which introduce noticeable, unwanted artifacts. By restricting the solution space to the SMPL+D model combined with differentiable rendering, we obtain dramatic speedups in compute, training times (up to 24x) and inference times (up to 192x). Our method therefore can be used as is or as a fast initialization to NeRF-based methods. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2210.17011 [pdf, other]

A picture of the space of typical learnable tasks

Authors: Rahul Ramesh, Jialin Mao, Itay Griniasty, Rubing Yang, Han Kheng Teoh, Mark Transtrum, James P. Sethna, Pratik Chaudhari

Abstract: We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different representati… ▽ More We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different representation learning methods is effectively low-dimensional; (2) supervised learning on one task results in a surprising amount of progress even on seemingly dissimilar tasks; progress on other tasks is larger if the training task has diverse classes; (3) the structure of the space of tasks indicated by our analysis is consistent with parts of the Wordnet phylogenetic tree; (4) episodic meta-learning algorithms and supervised learning traverse different trajectories during training but they fit similar models eventually; (5) contrastive and semi-supervised learning methods traverse trajectories similar to those of supervised learning. We use classification tasks constructed from the CIFAR-10 and Imagenet datasets to study these phenomena. △ Less

Submitted 21 July, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.01422 [pdf, other]

Time-Varying Propensity Score to Bridge the Gap between the Past and Present

Authors: Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola

Abstract: Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the… ▽ More Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes. △ Less

Submitted 2 May, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Published at ICLR 2024

arXiv:2208.10967 [pdf, other]

The Value of Out-of-Distribution Data

Authors: Ashwin De Silva, Rahul Ramesh, Carey E. Priebe, Pratik Chaudhari, Joshua T. Vogelstein

Abstract: We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task imp… ▽ More We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset. △ Less

Submitted 13 July, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Previous versions of this work have been presented at the Out-of-Distribution Generalization in Computer Vision (OOD-CV) Workshop (ECCV 2022) and the Workshop on Distribution Shifts (NeurIPS 2022)

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:7366-7389, 2023

arXiv:2208.01430 [pdf, other]

A Model for Multi-Agent Heterogeneous Interaction Problems

Authors: Christopher D. Hsu, Mulugeta A. Haile, Pratik Chaudhari

Abstract: We introduce a model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers. This model is inspired by how the human immune system tackles a diverse set of pathogens. The key property of this model is a "cross-reactivity" kernel which enables a particular defender type to respond strongly to so… ▽ More We introduce a model for multi-agent interaction problems to understand how a heterogeneous team of agents should organize its resources to tackle a heterogeneous team of attackers. This model is inspired by how the human immune system tackles a diverse set of pathogens. The key property of this model is a "cross-reactivity" kernel which enables a particular defender type to respond strongly to some attacker types but weakly to a few different types of attackers. We show how due to such cross-reactivity, the defender team can optimally counteract a heterogeneous attacker team using very few types of defender agents, and thereby minimize its resources. We study this model in different settings to characterize a set of guiding principles for control problems with heterogeneous teams of agents, e.g., sensitivity of the harm to sub-optimal defender distributions, and competition between defenders gives near-optimal behavior using decentralized computation of the control. We also compare this model with existing approaches including reinforcement-learned policies, perimeter defense, and coverage control. △ Less

Submitted 15 October, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: 8 pages, 7 figures

Report number: Proc. of the American Control Conference (ACC), 2024

arXiv:2207.01614 [pdf, other]

Beyond mAP: Towards better evaluation of instance segmentation

Authors: Rohit Jena, Lukas Zhornyak, Nehal Doiphode, Pratik Chaudhari, Vivek Buch, James Gee, Jianbo Shi

Abstract: Correctness of instance segmentation constitutes counting the number of objects, correctly localizing all predictions and classifying each localized prediction. Average Precision is the de-facto metric used to measure all these constituents of segmentation. However, this metric does not penalize duplicate predictions in the high-recall range, and cannot distinguish instances that are localized cor… ▽ More Correctness of instance segmentation constitutes counting the number of objects, correctly localizing all predictions and classifying each localized prediction. Average Precision is the de-facto metric used to measure all these constituents of segmentation. However, this metric does not penalize duplicate predictions in the high-recall range, and cannot distinguish instances that are localized correctly but categorized incorrectly. This weakness has inadvertently led to network designs that achieve significant gains in AP but also introduce a large number of false positives. We therefore cannot rely on AP to choose a model that provides an optimal tradeoff between false positives and high recall. To resolve this dilemma, we review alternative metrics in the literature and propose two new measures to explicitly measure the amount of both spatial and categorical duplicate predictions. We also propose a Semantic Sorting and NMS module to remove these duplicates based on a pixel occupancy matching scheme. Experiments show that modern segmentation networks have significant gains in AP, but also contain a considerable amount of duplicates. Our Semantic Sorting and NMS can be added as a plug-and-play module to mitigate hedged predictions and preserve AP. △ Less

Submitted 20 March, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted at CVPR 2023

arXiv:2206.05575 [pdf]

MammoFL: Mammographic Breast Density Estimation using Federated Learning

Authors: Ramya Muthukrishnan, Angelina Heyler, Keshava Katti, Sarthak Pati, Walter Mankowski, Aprupa Alahari, Michael Sanborn, Emily F. Conant, Christopher Scott, Stacey Winham, Celine Vachon, Pratik Chaudhari, Despina Kontos, Spyridon Bakas

Abstract: In this study, we automate quantitative mammographic breast density estimation with neural networks and show that this tool is a strong use case for federated learning on multi-institutional datasets. Our dataset included bilateral CC-view and MLO-view mammographic images from two separate institutions. Two U-Nets were separately trained on algorithm-generated labels to perform segmentation of the… ▽ More In this study, we automate quantitative mammographic breast density estimation with neural networks and show that this tool is a strong use case for federated learning on multi-institutional datasets. Our dataset included bilateral CC-view and MLO-view mammographic images from two separate institutions. Two U-Nets were separately trained on algorithm-generated labels to perform segmentation of the breast and dense tissue from these images and subsequently calculate breast percent density (PD). The networks were trained with federated learning and compared to three non-federated baselines, one trained on each single-institution dataset and one trained on the aggregated multi-institution dataset. We demonstrate that training on multi-institution datasets is critical to algorithm generalizability. We further show that federated learning on multi-institutional datasets improves model generalization to unseen data at nearly the same level as centralized training on multi-institutional datasets, indicating that federated learning can be applied to our method to improve algorithm generalizability while maintaining patient privacy. △ Less

Submitted 13 December, 2023; v1 submitted 11 June, 2022; originally announced June 2022.

Comments: Deep learning, federated learning, mammography, breast density, risk assessment

arXiv:2205.13421 [pdf, other]

doi 10.1073/pnas.2211613120

Bias in Machine Learning Models Can Be Significantly Mitigated by Careful Training: Evidence from Neuroimaging Studies

Authors: Rongguang Wang, Pratik Chaudhari, Christos Davatzikos

Abstract: Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide evidence which suggests that when properly tr… ▽ More Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide evidence which suggests that when properly trained, machine learning models can generalize well across diverse conditions and do not necessarily suffer from bias. Specifically, by using multi-study magnetic resonance imaging consortia for diagnosing Alzheimer's disease, schizophrenia, and autism spectrum disorder, we find that well-trained models have a high area-under-the-curve (AUC) on subjects across different subgroups pertaining to attributes such as gender, age, racial groups, and different clinical studies and are unbiased under multiple fairness metrics such as demographic parity difference, equalized odds difference, equal opportunity difference etc. We find that models that incorporate multi-source data from demographic, clinical, genetic factors and cognitive scores are also unbiased. These models have better predictive AUC across subgroups than those trained only with imaging features but there are also situations when these additional features do not help. △ Less

Submitted 30 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Journal ref: Proceedings of the National Academy of Sciences 120.6 (2023)

arXiv:2202.12482 [pdf, other]

Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity

Authors: Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, Ian J. Barnett

Abstract: Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse… ▽ More Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse neural additive models (SNAM) that employ the group sparsity regularization (e.g. Group LASSO), where each feature is learned by a sub-network whose trainable parameters are clustered as a group. We study the theoretical properties for SNAM with novel techniques to tackle the non-parametric truth, thus extending from classical sparse linear models such as the LASSO, which only works on the parametric truth. Specifically, we show that SNAM with subgradient and proximal gradient descents provably converges to zero training loss as $t\to\infty$, and that the estimation error of SNAM vanishes asymptotically as $n\to\infty$. We also prove that SNAM, similar to LASSO, can have exact support recovery, i.e. perfect feature selection, with appropriate regularization. Moreover, we show that the SNAM can generalize well and preserve the `identifiability', recovering each feature's effect. We validate our theories via extensive experiments and further testify to the good accuracy and efficiency of SNAM. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.00187 [pdf, other]

Deep Reference Priors: What is the best way to pretrain a model?

Authors: Yansong Gao, Rahul Ramesh, Pratik Chaudhari

Abstract: What is the best way to exploit extra data -- be it unlabeled data from the same task, or labeled data from a related task -- to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to m… ▽ More What is the best way to exploit extra data -- be it unlabeled data from the same task, or labeled data from a related task -- to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets. Code is available at https://github.com/grasp-lyrl/deep_reference_priors. △ Less

Submitted 15 June, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: 24 pages

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2110.14163 [pdf, other]

Does the Data Induce Capacity Control in Deep Learning?

Authors: Rubing Yang, Jialin Mao, Pratik Chaudhari

Abstract: We show that the input correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly… ▽ More We show that the input correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. We call such eigenspectra "sloppy" because sets of weights corresponding to small eigenvalues can be changed by large magnitudes without affecting the loss. Networks trained on atypical datasets with non-sloppy inputs do not share these traits and deep networks trained on such datasets generalize poorly. Inspired by this, we study the hypothesis that sloppiness of inputs aids generalization in deep networks. We show that if the Hessian is sloppy, we can compute non-vacuous PAC-Bayes generalization bounds analytically. By exploiting our empirical observation that training predominantly takes place in the non-sloppy subspace of the FIM, we develop data-distribution dependent PAC-Bayes priors that lead to accurate generalization bounds using numerical optimization. △ Less

Submitted 22 June, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

arXiv:2106.06845 [pdf, other]

Harmonization with Flow-based Causal Inference

Authors: Rongguang Wang, Pratik Chaudhari, Christos Davatzikos

Abstract: Heterogeneity in medical data, e.g., from data collected at different sites and with different protocols in a clinical study, is a fundamental hurdle for accurate prediction using machine learning models, as such models often fail to generalize well. This paper leverages a recently proposed normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM), in o… ▽ More Heterogeneity in medical data, e.g., from data collected at different sites and with different protocols in a clinical study, is a fundamental hurdle for accurate prediction using machine learning models, as such models often fail to generalize well. This paper leverages a recently proposed normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM), in order to achieve harmonization of such data. A causal model is used to model observed effects (brain magnetic resonance imaging data) that result from known confounders (site, gender and age) and exogenous noise variables. Our formulation exploits the bijection induced by flow for the purpose of harmonization. We infer the posterior of exogenous variables, intervene on observations, and draw samples from the resultant SCM to obtain counterfactuals. This approach is evaluated extensively on multiple, large, real-world medical datasets and displayed better cross-domain generalization compared to state-of-the-art algorithms. Further experiments that evaluate the quality of confounder-independent data generated by our model using regression and classification tasks are provided. △ Less

Submitted 10 July, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

arXiv:2106.03027 [pdf, other]

Model Zoo: A Growing "Brain" That Learns Continually

Authors: Rahul Ramesh, Pratik Chaudhari

Abstract: This paper argues that continual learning methods can benefit by splitting the capacity of the learner across multiple models. We use statistical learning theory and experimental analysis to show how multiple tasks can interact with each other in a non-trivial fashion when a single model is trained on them. The generalization error on a particular task can improve when it is trained with synergist… ▽ More This paper argues that continual learning methods can benefit by splitting the capacity of the learner across multiple models. We use statistical learning theory and experimental analysis to show how multiple tasks can interact with each other in a non-trivial fashion when a single model is trained on them. The generalization error on a particular task can improve when it is trained with synergistic tasks, but can also deteriorate when trained with competing tasks. This theory motivates our method named Model Zoo which, inspired from the boosting literature, grows an ensemble of small models, each of which is trained during one episode of continual learning. We demonstrate that Model Zoo obtains large gains in accuracy on a variety of continual learning benchmark problems. Code is available at https://github.com/grasp-lyrl/modelzoo_continual. △ Less

Submitted 15 June, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Report number: Proc. of the International Conference of Learning Representations (ICLR) 2022

arXiv:2103.16172 [pdf, other]

doi 10.1088/1742-5468/ac8e5d

Zeno crossovers in the entanglement speed of spin chains with noisy impurities

Authors: Abhijit P. Chaudhari, Shane P. Kelly, Riccardo J. Valencia-Tortora, Jamir Marino

Abstract: We use a noisy signal with finite correlation time to drive a spin (dissipative impurity) in the quantum XY spin chain and calculate the dynamics of entanglement entropy of a bipartition of spins, for a stochastic quantum trajectory. We compute the noise averaged entanglement entropy of a bipartition of spins and observe that its speed of spreading decreases at strong dissipation, as a result of t… ▽ More We use a noisy signal with finite correlation time to drive a spin (dissipative impurity) in the quantum XY spin chain and calculate the dynamics of entanglement entropy of a bipartition of spins, for a stochastic quantum trajectory. We compute the noise averaged entanglement entropy of a bipartition of spins and observe that its speed of spreading decreases at strong dissipation, as a result of the Zeno effect. We recover the Zeno crossover and show that noise averaged entanglement entropy can be used as a proxy for the heating and Zeno regimes. Upon increasing the correlation time of the noise, the location of the Zeno crossover shifts at stronger dissipation, extending the heating regime. △ Less

Submitted 5 October, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: 14 pages, 5 figures

Journal ref: J. Stat. Mech. (2022) 103101

arXiv:2103.14184 [pdf, other]

Deformable Linear Object Prediction Using Locally Linear Latent Dynamics

Authors: Wenbo Zhang, Karl Schmeckpeper, Pratik Chaudhari, Kostas Daniilidis

Abstract: We propose a framework for deformable linear object prediction. Prediction of deformable objects (e.g., rope) is challenging due to their non-linear dynamics and infinite-dimensional configuration spaces. By map** the dynamics from a non-linear space to a linear space, we can use the good properties of linear dynamics for easier learning and more efficient prediction. We learn a locally linear,… ▽ More We propose a framework for deformable linear object prediction. Prediction of deformable objects (e.g., rope) is challenging due to their non-linear dynamics and infinite-dimensional configuration spaces. By map** the dynamics from a non-linear space to a linear space, we can use the good properties of linear dynamics for easier learning and more efficient prediction. We learn a locally linear, action-conditioned dynamics model that can be used to predict future latent states. Then, we decode the predicted latent state into the predicted state. We also apply a sampling-based optimization algorithm to select the optimal control action. We empirically demonstrate that our approach can predict the rope state accurately up to ten steps into the future and that our algorithm can find the optimal action given an initial state and a goal state. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.12857 [pdf, other]

doi 10.1016/j.media.2021.102309

Embracing the Disharmony in Medical Imaging: A Simple and Effective Framework for Domain Adaptation

Authors: Rongguang Wang, Pratik Chaudhari, Christos Davatzikos

Abstract: Domain shift, the mismatch between training and testing data characteristics, causes significant degradation in the predictive performance in multi-source imaging scenarios. In medical imaging, the heterogeneity of population, scanners and acquisition protocols at different sites presents a significant domain shift challenge and has limited the widespread clinical adoption of machine learning mode… ▽ More Domain shift, the mismatch between training and testing data characteristics, causes significant degradation in the predictive performance in multi-source imaging scenarios. In medical imaging, the heterogeneity of population, scanners and acquisition protocols at different sites presents a significant domain shift challenge and has limited the widespread clinical adoption of machine learning models. Harmonization methods which aim to learn a representation of data invariant to these differences are the prevalent tools to address domain shift, but they typically result in degradation of predictive accuracy. This paper takes a different perspective of the problem: we embrace this disharmony in data and design a simple but effective framework for tackling domain shift. The key idea, based on our theoretical arguments, is to build a pretrained classifier on the source data and adapt this model to new data. The classifier can be fine-tuned for intra-site domain adaptation. We can also tackle situations where we do not have access to ground-truth labels on target data; we show how one can use auxiliary tasks for adaptation; these tasks employ covariates such as age, gender and race which are easy to obtain but nevertheless correlated to the main task. We demonstrate substantial improvements in both intra-site domain adaptation and inter-site domain generalization on large-scale real-world 3D brain MRI datasets for classifying Alzheimer's disease and schizophrenia. △ Less

Submitted 20 December, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Journal ref: Medical Image Analysis 76:102309 (2022)

arXiv:2102.09225 [pdf, other]

Continuous Doubly Constrained Batch Reinforcement Learning

Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected. △ Less

Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: NeurIPS 2021 conference paper

arXiv:2011.08055 [pdf, other]

Scalable Reinforcement Learning Policies for Multi-Agent Control

Authors: Christopher D. Hsu, Hee** Jeong, George J. Pappas, Pratik Chaudhari

Abstract: We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. Our method can handle an arbitrary number of pursuers and targets; we show results for tasks consisting up to 1000 pursuers tracking 1000 targets. We use a decentralized, partially-observable Markov Decision Process framework to model pursuers as agents receiving partial observatio… ▽ More We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. Our method can handle an arbitrary number of pursuers and targets; we show results for tasks consisting up to 1000 pursuers tracking 1000 targets. We use a decentralized, partially-observable Markov Decision Process framework to model pursuers as agents receiving partial observations (range and bearing) about targets which move using fixed, unknown policies. An attention mechanism is used to parameterize the value function of the agents; this mechanism allows us to handle an arbitrary number of targets. Entropy-regularized off-policy RL methods are used to train a stochastic policy, and we discuss how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution. We further develop a masking heuristic that allows training on smaller problems with few pursuers-targets and execution on much larger problems. Thorough simulation experiments, ablation studies, and comparisons to state of the art algorithms are performed to study the scalability of the approach and robustness of performance to varying numbers of agents and targets. △ Less

Submitted 10 November, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: 8 pages, 10 figures, contributed paper at IROS 2021

arXiv:2011.00613 [pdf, other]

An Information-Geometric Distance on the Space of Tasks

Authors: Yansong Gao, Pratik Chaudhari

Abstract: This paper prescribes a distance between learning tasks modeled as joint distributions on data and labels. Using tools in information geometry, the distance is defined to be the length of the shortest weight trajectory on a Riemannian manifold as a classifier is fitted on an interpolated task. The interpolated task evolves from the source to the target task using an optimal transport formulation.… ▽ More This paper prescribes a distance between learning tasks modeled as joint distributions on data and labels. Using tools in information geometry, the distance is defined to be the length of the shortest weight trajectory on a Riemannian manifold as a classifier is fitted on an interpolated task. The interpolated task evolves from the source to the target task using an optimal transport formulation. This distance, which we call the "coupled transfer distance" can be compared across different classifier architectures. We develop an algorithm to compute the distance which iteratively transports the marginal on the data of the source task to that of the target task while updating the weights of the classifier to track this evolving data distribution. We develop theory to show that our distance captures the intuitive idea that a good transfer trajectory is the one that keeps the generalization gap small during transfer, in particular at the end on the target task. We perform thorough empirical validation and analysis across diverse image classification datasets to show that the coupled transfer distance correlates strongly with the difficulty of fine-tuning. △ Less

Submitted 24 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Report number: Proc. of the International Conference of Machine Learning (ICML) 2021

arXiv:2008.07081 [pdf, other]

MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation

Authors: Xiaoyi Chen, Pratik Chaudhari

Abstract: Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper b… ▽ More Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a "driver-type" parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making. △ Less

Submitted 23 March, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: Code available at https://github.com/sherrychen1120/MIDAS. To be presented at IEEE International Conference on Robotics and Automation (ICRA), 2021

arXiv:2008.00759 [pdf, other]

Proximal Deterministic Policy Gradient

Authors: Marco Maggipinto, Gian Antonio Susto, Pratik Chaudhari

Abstract: This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algor… ▽ More This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algorithms to provide an improved action value estimate through bootstrap** with limited increase of computational resources. Further, we demonstrate significant performance improvement over state-of-the-art algorithms on standard continuous-control RL benchmarks. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2006.15199 [pdf, other]

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

Abstract: This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint trai… ▽ More This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step; existing solutions such as delayed policy updates do not mitigate this issue. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from the replay buffer and selectively update the policy to prevent deterioration of performance. We make these claims using extensive experimentation on a set of challenging MuJoCo tasks. A short video of our results can be seen at https://tinyurl.com/scs6p5m . △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.14284 [pdf, other]

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Authors: Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola

Abstract: Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily com… ▽ More Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. At the heart of our approach is a data augmentation strategy based on Gibbs sampling from a self-attention pseudolikelihood estimator. Across 30 datasets spanning regression and binary/multiclass classification tasks, FAST-DAD distillation produces significantly better individual models than one obtains through standard training on the original data. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Journal ref: NeurIPS 2020

arXiv:2005.04755 [pdf, other]

BayesRace: Learning to race autonomously using prior experience

Authors: Achin Jain, Matthew O'Kelly, Pratik Chaudhari, Manfred Morari

Abstract: Autonomous race cars require perception, estimation, planning, and control modules which work together asynchronously while driving at the limit of a vehicle's handling capability. A fundamental challenge encountered in designing these software components lies in predicting the vehicle's future state (e.g. position, orientation, and speed) with high accuracy. The root cause is the difficulty in id… ▽ More Autonomous race cars require perception, estimation, planning, and control modules which work together asynchronously while driving at the limit of a vehicle's handling capability. A fundamental challenge encountered in designing these software components lies in predicting the vehicle's future state (e.g. position, orientation, and speed) with high accuracy. The root cause is the difficulty in identifying vehicle model parameters that capture the effects of lateral tire slip. We present a model-based planning and control framework for autonomous racing that significantly reduces the effort required in system identification and control design. Our approach alleviates the gap induced by simulation-based controller design by learning from on-board sensor measurements. A major focus of this work is empirical, thus, we demonstrate our contributions by experiments on validated 1:43 and 1:10 scale autonomous racing simulations. △ Less

Submitted 15 November, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

Journal ref: 4th Conference on Robot Learning (CoRL 2020)

arXiv:2004.02441 [pdf, other]

TraDE: Transformers for Density Estimation

Authors: Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola

Abstract: We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics durin… ▽ More We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics during the auto-regressive process beyond what is needed for each covariate. On standard tabular and image data benchmarks, TraDE produces significantly better density estimates than existing approaches such as normalizing flow estimators and recurrent auto-regressive models. However log-likelihood on held-out data only partially reflects how useful these estimates are in real-world applications. In order to systematically evaluate density estimators, we present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data and demonstrate that TraDE works well in these scenarios. △ Less

Submitted 14 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

arXiv:2002.12406 [pdf, other]

A Free-Energy Principle for Representation Learning

Authors: Yansong Gao, Pratik Chaudhari

Abstract: This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning. We discuss how information-theoretic functional such as rate, distortion and classification loss of a model lie on a convex, so-called equilibrium surface.We prescribe dynamical processes to traverse this surface under constraints, e.g., an iso… ▽ More This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning. We discuss how information-theoretic functional such as rate, distortion and classification loss of a model lie on a convex, so-called equilibrium surface.We prescribe dynamical processes to traverse this surface under constraints, e.g., an iso-classification process that trades off rate and distortion to keep the classification loss unchanged. We demonstrate how this process can be used for transferring representations from a source dataset to a target dataset while kee** the classification loss constant. Experimental validation of the theoretical results is provided on standard image-classification datasets. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: 21 pages, 14 figures

arXiv:2002.11770 [pdf, other]

Rethinking the Hyperparameters for Fine-tuning

Authors: Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

Abstract: Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyperparameters and kee** them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyperparameters for fine-tuning. Our findings are based… ▽ More Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyperparameters and kee** them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyperparameters for fine-tuning. Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. (1) While prior works have thoroughly investigated learning rate and batch size, momentum for fine-tuning is a relatively unexplored parameter. We find that the value of momentum also affects fine-tuning performance and connect it with previous theoretical findings. (2) Optimal hyperparameters for fine-tuning, in particular, the effective learning rate, are not only dataset dependent but also sensitive to the similarity between the source domain and target domain. This is in contrast to hyperparameters for training from scratch. (3) Reference-based regularization that keeps models close to the initial model does not necessarily apply for "dissimilar" datasets. Our findings challenge common practices of fine-tuning and encourages deep learning practitioners to rethink the hyperparameters for fine-tuning. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: Published as a conference paper at ICLR 2020

Showing 1–50 of 67 results for author: Chaudhari, P