Search | arXiv e-print repository

Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Authors: Alex Quach, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and… ▽ More Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.06147 [pdf, other]

State-Free Inference of State-Space Models: The Transfer Function Approach

Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of… ▽ More We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF. △ Less

Submitted 1 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Resubmission 02/06/2024: Fixed minor typo of recurrent form RTF

arXiv:2404.01750 [pdf, other]

Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder

Authors: Anass Bairouk, Mirjana Maras, Simon Herlin, Alexander Amini, Marc Blanchon, Ramin Hasani, Patrick Chareyre, Daniela Rus

Abstract: Autonomous driving presents a complex challenge, which is usually addressed with artificial intelligence models that are end-to-end or modular in nature. Within the landscape of modular approaches, a bio-inspired neural circuit policy model has emerged as an innovative control module, offering a compact and inherently interpretable system to infer a steering wheel command from abstract visual feat… ▽ More Autonomous driving presents a complex challenge, which is usually addressed with artificial intelligence models that are end-to-end or modular in nature. Within the landscape of modular approaches, a bio-inspired neural circuit policy model has emerged as an innovative control module, offering a compact and inherently interpretable system to infer a steering wheel command from abstract visual features. Here, we take a leap forward by integrating a variational autoencoder with the neural circuit policy controller, forming a solution that directly generates steering commands from input camera images. By substituting the traditional convolutional neural network approach to feature extraction with a variational autoencoder, we enhance the system's interpretability, enabling a more transparent and understandable decision-making process. In addition to the architectural shift toward a variational autoencoder, this study introduces the automatic latent perturbation tool, a novel contribution designed to probe and elucidate the latent features within the variational autoencoder. The automatic latent perturbation tool automates the interpretability process, offering granular insights into how specific latent variables influence the overall model's behavior. Through a series of numerical experiments, we demonstrate the interpretative power of the variational autoencoder-neural circuit policy model and the utility of the automatic latent perturbation tool in making the inner workings of autonomous driving systems more transparent. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2401.08602 [pdf, other]

Learning with Chemical versus Electrical Synapses -- Does it Make a Difference?

Authors: Mónika Farsang, Mathias Lechner, David Lung, Ramin Hasani, Daniela Rus, Radu Grosu

Abstract: Bio-inspired neural networks have the potential to advance our understanding of neural computation and improve the state-of-the-art of AI systems. Bio-electrical synapses directly transmit neural signals, by enabling fast current flow between neurons. In contrast, bio-chemical synapses transmit neural signals indirectly, through neurotransmitters. Prior work showed that interpretable dynamics for… ▽ More Bio-inspired neural networks have the potential to advance our understanding of neural computation and improve the state-of-the-art of AI systems. Bio-electrical synapses directly transmit neural signals, by enabling fast current flow between neurons. In contrast, bio-chemical synapses transmit neural signals indirectly, through neurotransmitters. Prior work showed that interpretable dynamics for complex robotic control, can be achieved by using chemical synapses, within a sparse, bio-inspired architecture, called Neural Circuit Policies (NCPs). However, a comparison of these two synaptic models, within the same architecture, remains an unexplored area. In this work we aim to determine the impact of using chemical synapses compared to electrical synapses, in both sparse and all-to-all connected networks. We conduct experiments with autonomous lane-kee** through a photorealistic autonomous driving simulator to evaluate their performance under diverse conditions and in the presence of noise. The experiments highlight the substantial influence of the architectural and synaptic-model choices, respectively. Our results show that employing chemical synapses yields noticeable improvements compared to electrical synapses, and that NCPs lead to better results in both synaptic models. △ Less

Submitted 21 November, 2023; originally announced January 2024.

arXiv:2310.03915 [pdf, other]

Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control

Authors: Neehal Tumma, Mathias Lechner, Noel Loo, Ramin Hasani, Daniela Rus

Abstract: Develo** autonomous agents that can interact with changing environments is an open challenge in machine learning. Robustness is particularly important in these settings as agents are often fit offline on expert demonstrations but deployed online where they must generalize to the closed feedback loop within the environment. In this work, we explore the application of recurrent neural networks to… ▽ More Develo** autonomous agents that can interact with changing environments is an open challenge in machine learning. Robustness is particularly important in these settings as agents are often fit offline on expert demonstrations but deployed online where they must generalize to the closed feedback loop within the environment. In this work, we explore the application of recurrent neural networks to tasks of this nature and understand how a parameterization of their recurrent connectivity influences robustness in closed-loop settings. Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity. △ Less

Submitted 30 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2305.14113 [pdf, other]

On the Size and Approximation Error of Distilled Sets

Authors: Alaa Maalouf, Murad Tukan, Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus

Abstract: Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the o… ▽ More Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the original dataset, and how large are distilled datasets? In this work, we take a theoretical view on kernel ridge regression (KRR) based methods of dataset distillation such as Kernel Inducing Points. By transforming ridge regression in random Fourier features (RFF) space, we provide the first proof of the existence of small (size) distilled datasets and their corresponding excess risk for shift-invariant kernels. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. We further show that a KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data. The size of this set is linear in the dimension of the RFF space of the input set or alternatively near linear in the number of effective degrees of freedom, which is a function of the kernel, number of datapoints, and the regularization parameter $λ$. The error bound of this distilled set is also a function of $λ$. We verify our bounds analytically and empirically. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.02733 [pdf, other]

Learning Stability Attention in Vision-based End-to-end Driving Policies

Authors: Tsun-Hsuan Wang, Wei Xiao, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end… ▽ More Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility. We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control, and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: First two authors contributed equally; L4DC 2023

arXiv:2303.12224 [pdf, other]

doi 10.1109/ICRA48891.2023.10161536

Infrastructure-based End-to-End Learning and Prevention of Driver Failure

Authors: Noam Buckman, Shiva Sreeram, Mathias Lechner, Yutong Ban, Ramin Hasani, Sertac Karaman, Daniela Rus

Abstract: Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they a… ▽ More Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they approach an intersection and detects whether a failure is present in the autonomy stack, warning cross-traffic of potentially dangerous drivers. FailureNet can accurately identify control failures, upstream perception errors, and speeding drivers, distinguishing them from nominal driving. The network is trained and deployed with autonomous vehicles in the MiniCity. Compared to speed or frequency-based predictors, FailureNet's recurrent neural network structure provides improved predictive power, yielding upwards of 84% accuracy when deployed on hardware. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 8 pages. Accepted to ICRA 2023

arXiv:2302.06755 [pdf, other]

Dataset Distillation with Convexified Implicit Gradients

Authors: Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus

Abstract: We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a conve… ▽ More We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural tangent kernel. Finally, we improve bias in implicit gradients by parameterizing the neural network to enable analytical computation of final-layer parameters given the body parameters. RCIG establishes the new state-of-the-art on a diverse series of dataset distillation tasks. Notably, with one image per class, on resized ImageNet, RCIG sees on average a 108\% improvement over the previous state-of-the-art distillation algorithm. Similarly, we observed a 66\% gain over SOTA on Tiny-ImageNet and 37\% on CIFAR-100. △ Less

Submitted 9 November, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

arXiv:2302.01428 [pdf, other]

Understanding Reconstruction Attacks with the Neural Tangent Kernel and Dataset Distillation

Authors: Noel Loo, Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus

Abstract: Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effectiv… ▽ More Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effective regime which datapoints are susceptible to reconstruction. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover the \emph{entire training set} in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. Next, we study the nature of easily-reconstructed images. We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset, and that these reconstruction attacks can be used for \textit{dataset distillation}, that is, we can retrain on reconstructed images and obtain high predictive accuracy. △ Less

Submitted 9 November, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2212.11084 [pdf, other]

Towards Cooperative Flight Control Using Visual-Attention

Authors: Lianhao Yin, Makram Chahine, Tsun-Hsuan Wang, Tim Seyde, Chao Liu, Mathias Lechner, Ramin Hasani, Daniela Rus

Abstract: The cooperation of a human pilot with an autonomous agent during flight control realizes parallel autonomy. We propose an air-guardian system that facilitates cooperation between a pilot with eye tracking and a parallel end-to-end neural control system. Our vision-based air-guardian system combines a causal continuous-depth neural network model with a cooperation layer to enable parallel autonomy… ▽ More The cooperation of a human pilot with an autonomous agent during flight control realizes parallel autonomy. We propose an air-guardian system that facilitates cooperation between a pilot with eye tracking and a parallel end-to-end neural control system. Our vision-based air-guardian system combines a causal continuous-depth neural network model with a cooperation layer to enable parallel autonomy between a pilot and a control system based on perceived differences in their attention profiles. The attention profiles for neural networks are obtained by computing the networks' saliency maps (feature importance) through the VisualBackProp algorithm, while the attention profiles for humans are either obtained by eye tracking of human pilots or saliency maps of networks trained to imitate human pilots. When the attention profile of the pilot and guardian agents align, the pilot makes control decisions. Otherwise, the air-guardian makes interventions and takes over the control of the aircraft. We show that our attention-based air-guardian system can balance the trade-off between its level of involvement in the flight and the pilot's expertise and attention. The guardian system is particularly effective in situations where the pilot was distracted due to information overload. We demonstrate the effectiveness of our method for navigating flight scenarios in simulation with a fixed-wing aircraft and on hardware with a quadrotor platform. △ Less

Submitted 20 September, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2210.12067 [pdf, other]

Efficient Dataset Distillation Using Random Feature Approximation

Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Abstract: Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slo… ▽ More Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slow due to the exact computation of the neural tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size. To improve this, we propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted to the Conference on the Advances in Neural Information Processing Systems (NeurIPS) 2022

arXiv:2210.12030 [pdf, other]

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Abstract: Two key challenges facing modern deep learning are mitigating deep networks' vulnerability to adversarial attacks and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent K… ▽ More Two key challenges facing modern deep learning are mitigating deep networks' vulnerability to adversarial attacks and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent Kernel (NTK) -- a characterization of neural network behavior in the infinite-width limit. In this limit, the kernel is frozen, and the underlying feature map is fixed. In finite widths, however, there is evidence that feature learning happens at the earlier stages of the training (kernel learning) before a second phase where the kernel remains fixed (lazy training). While prior work has aimed at studying adversarial vulnerability through the lens of the frozen infinite-width NTK, there is no work that studies the adversarial robustness of the empirical/finite NTK during training. In this work, we perform an empirical study of the evolution of the empirical NTK under standard and adversarial training, aiming to disambiguate the effect of adversarial training on kernel learning and lazy training. We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training. This new kernel provides adversarial robustness, even when non-robust training is performed on top of it. Furthermore, we find that adversarial training on top of a fixed kernel can yield a classifier with $76.1\%$ robust accuracy under PGD attacks with $\varepsilon = 4/255$ on CIFAR-10. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted to the Conference on Advances in Neural Information Processing Systems (NeurIPS) 2022

arXiv:2210.11114 [pdf, other]

Pruning by Active Attention Manipulation

Authors: Zahra Babaiee, Lucas Liebenwein, Ramin Hasani, Daniela Rus, Radu Grosu

Abstract: Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter wei… ▽ More Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores. As the filters are not independent, we use attention to dynamically learn their correlations. Moreover, by training the pruning scores of all layers simultaneously, PAAM can account for layer inter-dependencies, which is essential to finding a performant sparse sub-network. PAAM can also train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Finally, PAAM does not need layer-specific hyperparameters and pre-defined layer budgets, since it can implicitly determine the appropriate number of filters in each layer. Our experimental results on different network architectures suggest that PAAM outperforms state-of-the-art structured-pruning methods (SOTA). On CIFAR-10 dataset, without requiring a pre-trained baseline network, we obtain 1.02% and 1.19% accuracy gain and 52.3% and 54% parameters reduction, on ResNet56 and ResNet110, respectively. Similarly, on the ImageNet dataset, PAAM achieves 1.06% accuracy gain while pruning 51.1% of the parameters on ResNet50. For Cifar-10, this is better than the SOTA with a margin of 9.5% and 6.6%, respectively, and on ImageNet with a margin of 11%. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.07412

arXiv:2210.06650 [pdf, other]

Interpreting Neural Policies with Disentangled Tree Representations

Authors: Tsun-Hsuan Wang, Wei Xiao, Tim Seyde, Ramin Hasani, Daniela Rus

Abstract: The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning… ▽ More The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis. △ Less

Submitted 12 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.04763 [pdf, other]

On the Forward Invariance of Neural ODEs

Authors: Wei Xiao, Tsun-Hsuan Wang, Ramin Hasani, Mathias Lechner, Yutong Ban, Chuang Gan, Daniela Rus

Abstract: We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system. This setup allows us to achieve output specification guarantees simply by changing the constr… ▽ More We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system. This setup allows us to achieve output specification guarantees simply by changing the constrained parameters/inputs both during training and inference. Moreover, we demonstrate that our invariance set propagation through data-controlled neural ODEs not only maintains generalization performance but also creates an additional degree of robustness by enabling causal manipulation of the system's parameters/inputs. We test our method on a series of representation learning tasks, including modeling physical dynamics and convexity portraits, as well as safe collision avoidance for autonomous vehicles. △ Less

Submitted 31 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 25 pages, accepted in ICML2023, website: https://weixy21.github.io/invariance/

arXiv:2210.04728 [pdf, other]

PyHopper -- Hyperparameter optimization

Authors: Mathias Lechner, Ramin Hasani, Philipp Neubauer, Sophie Neubauer, Daniela Rus

Abstract: Hyperparameter tuning is a fundamental aspect of machine learning research. Setting up the infrastructure for systematic optimization of hyperparameters can take a significant amount of time. Here, we present PyHopper, a black-box optimization platform designed to streamline the hyperparameter tuning workflow of machine learning researchers. PyHopper's goal is to integrate with existing code with… ▽ More Hyperparameter tuning is a fundamental aspect of machine learning research. Setting up the infrastructure for systematic optimization of hyperparameters can take a significant amount of time. Here, we present PyHopper, a black-box optimization platform designed to streamline the hyperparameter tuning workflow of machine learning researchers. PyHopper's goal is to integrate with existing code with minimal effort and run the optimization process with minimal necessary manual oversight. With simplicity as the primary theme, PyHopper is powered by a single robust Markov-chain Monte-Carlo optimization algorithm that scales to millions of dimensions. Compared to existing tuning packages, focusing on a single algorithm frees the user from having to decide between several algorithms and makes PyHopper easily customizable. PyHopper is publicly available under the Apache-2.0 license at https://github.com/PyHopper/PyHopper. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2210.04303 [pdf, other]

Are All Vision Models Created Equal? A Study of the Open-Loop to Closed-Loop Causality Gap

Authors: Mathias Lechner, Ramin Hasani, Alexander Amini, Tsun-Hsuan Wang, Thomas A. Henzinger, Daniela Rus

Abstract: There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to patch-based networks, have been extensively tested on offline image classification and regression tasks. In this paper, we study these vision architectures with respect to the open-loop to closed-loop causalit… ▽ More There is an ever-growing zoo of modern neural network models that can efficiently learn end-to-end control from visual observations. These advanced deep models, ranging from convolutional to patch-based networks, have been extensively tested on offline image classification and regression tasks. In this paper, we study these vision architectures with respect to the open-loop to closed-loop causality gap, i.e., offline training followed by an online closed-loop deployment. This causality gap typically emerges in robotics applications such as autonomous driving, where a network is trained to imitate the control commands of a human. In this setting, two situations arise: 1) Closed-loop testing in-distribution, where the test environment shares properties with those of offline training data. 2) Closed-loop testing under distribution shifts and out-of-distribution. Contrary to recently reported results, we show that under proper training guidelines, all vision models perform indistinguishably well on in-distribution deployment, resolving the causality gap. In situation 2, We observe that the causality gap disrupts performance regardless of the choice of the model architecture. Our results imply that the causality gap can be solved in situation one with our proposed training guideline with any modern network architecture, whereas achieving out-of-distribution generalization (situation two) requires further investigations, for instance, on data diversity rather than the model architecture. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2209.12951 [pdf, other]

Liquid Structural State-Space Models

Authors: Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, Daniela Rus

Abstract: A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear… ▽ More A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) state-space model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical time-series, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2206.01261 [pdf, other]

Entangled Residual Map**s

Authors: Mathias Lechner, Ramin Hasani, Zahra Babaiee, Radu Grosu, Daniela Rus, Thomas A. Henzinger, Sepp Hochreiter

Abstract: Residual map**s have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual map**s to generalize the structure of the residual connections and… ▽ More Residual map**s have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual map**s to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual map** replaces the identity skip connections with specialized entangled map**s such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity map**s. We show that while entangled map**s can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse map** can help generalization while orthogonal map**s hurt performance. For recurrent networks, orthogonal residual map**s form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 21 Pages

arXiv:2204.07412 [pdf, other]

End-to-End Sensitivity-Based Filter Pruning

Authors: Zahra Babaiee, Lucas Liebenwein, Ramin Hasani, Daniela Rus, Radu Grosu

Abstract: In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interd… ▽ More In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%. △ Less

Submitted 15 April, 2022; originally announced April 2022.

arXiv:2203.02401 [pdf, other]

Differentiable Control Barrier Functions for Vision-based End-to-End Autonomous Driving

Authors: Wei Xiao, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Guaranteeing safety of perception-based learning systems is challenging due to the absence of ground-truth state information unlike in state-aware control scenarios. In this paper, we introduce a safety guaranteed learning framework for vision-based end-to-end autonomous driving. To this end, we design a learning system equipped with differentiable control barrier functions (dCBFs) that is trained… ▽ More Guaranteeing safety of perception-based learning systems is challenging due to the absence of ground-truth state information unlike in state-aware control scenarios. In this paper, we introduce a safety guaranteed learning framework for vision-based end-to-end autonomous driving. To this end, we design a learning system equipped with differentiable control barrier functions (dCBFs) that is trained end-to-end by gradient descent. Our models are composed of conventional neural network architectures and dCBFs. They are interpretable at scale, achieve great test performance under limited training data, and are safety guaranteed in a series of autonomous driving scenarios such as lane kee** and obstacle avoidance. We evaluated our framework in a sim-to-real environment, and tested on a real autonomous car, achieving safe lane following and obstacle avoidance via Augmented Reality (AR) and real parked vehicles. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: 11 pages, Wei Xiao and Tsun-Hsuan Wang are with equal contributions

arXiv:2111.11277 [pdf, other]

BarrierNet: A Safety-Guaranteed Layer for Neural Networks

Authors: Wei Xiao, Ramin Hasani, Xiao Li, Daniela Rus

Abstract: This paper introduces differentiable higher-order control barrier functions (CBF) that are end-to-end trainable together with learning systems. CBFs are usually overly conservative, while guaranteeing safety. Here, we address their conservativeness by softening their definitions using environmental dependencies without loosing safety guarantees, and embed them into differentiable quadratic program… ▽ More This paper introduces differentiable higher-order control barrier functions (CBF) that are end-to-end trainable together with learning systems. CBFs are usually overly conservative, while guaranteeing safety. Here, we address their conservativeness by softening their definitions using environmental dependencies without loosing safety guarantees, and embed them into differentiable quadratic programs. These novel safety layers, termed a BarrierNet, can be used in conjunction with any neural network-based controller, and can be trained by gradient descent. BarrierNet allows the safety constraints of a neural controller be adaptable to changing environments. We evaluate them on a series of control problems such as traffic merging and robot navigations in 2D and 3D space, and demonstrate their effectiveness compared to state-of-the-art approaches. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 23 pages

arXiv:2110.07667 [pdf, other]

Interactive Analysis of CNN Robustness

Authors: Stefan Sietzen, Mathias Lechner, Judy Borowski, Ramin Hasani, Manuela Waldner

Abstract: While convolutional neural networks (CNNs) have found wide adoption as state-of-the-art models for image-related tasks, their predictions are often highly sensitive to small input perturbations, which the human vision is robust against. This paper presents Perturber, a web-based application that allows users to instantaneously explore how CNN activations and predictions evolve when a 3D input scen… ▽ More While convolutional neural networks (CNNs) have found wide adoption as state-of-the-art models for image-related tasks, their predictions are often highly sensitive to small input perturbations, which the human vision is robust against. This paper presents Perturber, a web-based application that allows users to instantaneously explore how CNN activations and predictions evolve when a 3D input scene is interactively perturbed. Perturber offers a large variety of scene modifications, such as camera controls, lighting and shading effects, background modifications, object morphing, as well as adversarial attacks, to facilitate the discovery of potential vulnerabilities. Fine-tuned model versions can be directly compared for qualitative evaluation of their robustness. Case studies with machine learning experts have shown that Perturber helps users to quickly generate hypotheses about model vulnerabilities and to qualitatively compare model behavior. Using quantitative analyses, we could replicate users' insights with other CNN architectures and input images, yielding new insights about the vulnerability of adversarially trained models. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted at Pacific Graphics 2021

arXiv:2107.08467 [pdf, other]

GoTube: Scalable Stochastic Verification of Continuous-Depth Models

Authors: Sophie Gruenbacher, Mathias Lechner, Ramin Hasani, Daniela Rus, Thomas A. Henzinger, Scott Smolka, Radu Grosu

Abstract: We introduce a new stochastic verification algorithm that formally quantifies the behavioral robustness of any time-continuous process formulated as a continuous-depth model. Our algorithm solves a set of global optimization (Go) problems over a given time horizon to construct a tight enclosure (Tube) of the set of all process executions starting from a ball of initial states. We call our algorith… ▽ More We introduce a new stochastic verification algorithm that formally quantifies the behavioral robustness of any time-continuous process formulated as a continuous-depth model. Our algorithm solves a set of global optimization (Go) problems over a given time horizon to construct a tight enclosure (Tube) of the set of all process executions starting from a ball of initial states. We call our algorithm GoTube. Through its construction, GoTube ensures that the bounding tube is conservative up to a desired probability and up to a desired tightness. GoTube is implemented in JAX and optimized to scale to complex continuous-depth neural network models. Compared to advanced reachability analysis tools for time-continuous neural networks, GoTube does not accumulate overapproximation errors between time steps and avoids the infamous wrap** effect inherent in symbolic techniques. We show that GoTube substantially outperforms state-of-the-art verification tools in terms of the size of the initial ball, speed, time-horizon, task completion, and scalability on a large set of experiments. GoTube is stable and sets the state-of-the-art in terms of its ability to scale to time horizons well beyond what has been previously possible. △ Less

Submitted 2 December, 2021; v1 submitted 18 July, 2021; originally announced July 2021.

Comments: Accepted to the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2106.13898 [pdf, other]

doi 10.1038/s42256-022-00556-7

Closed-form Continuous-time Neural Models

Authors: Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Aaron Ray, Max Tschaikowski, Gerald Teschl, Daniela Rus

Abstract: Continuous-time neural processes are performant sequential decision-makers that are built by differential equations (DE). However, their expressive power when they are deployed on computers is bottlenecked by numerical DE solvers. This limitation has significantly slowed down the scaling and understanding of numerous natural physical phenomena such as the dynamics of nervous systems. Ideally, we w… ▽ More Continuous-time neural processes are performant sequential decision-makers that are built by differential equations (DE). However, their expressive power when they are deployed on computers is bottlenecked by numerical DE solvers. This limitation has significantly slowed down the scaling and understanding of numerous natural physical phenomena such as the dynamics of nervous systems. Ideally, we would circumvent this bottleneck by solving the given dynamical system in closed form. This is known to be intractable in general. Here, we show it is possible to closely approximate the interaction between neurons and synapses -- the building blocks of natural and artificial neural networks -- constructed by liquid time-constant networks (LTCs) efficiently in closed-form. To this end, we compute a tightly-bounded approximation of the solution of an integral appearing in LTCs' dynamics, that has had no known closed-form solution so far. This closed-form solution substantially impacts the design of continuous-time and continuous-depth neural models; for instance, since time appears explicitly in closed-form, the formulation relaxes the need for complex numerical solvers. Consequently, we obtain models that are between one and five orders of magnitude faster in training and inference compared to differential equation-based counterparts. More importantly, in contrast to ODE-based continuous networks, closed-form networks can scale remarkably well compared to other deep learning instances. Lastly, as these models are derived from liquid networks, they show remarkable performance in time series modeling, compared to advanced recurrent models. △ Less

Submitted 2 March, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: 40 pages

Journal ref: Nature Machine Intelligence 4, 992--1003 (2022)

arXiv:2106.12718 [pdf, other]

Sparse Flows: Pruning Continuous-depth Models

Authors: Lucas Liebenwein, Ramin Hasani, Alexander Amini, Daniela Rus

Abstract: Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that… ▽ More Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that pruning improves generalization for neural ODEs in generative modeling. We empirically show that the improvement is because pruning helps avoid mode-collapse and flatten the loss surface. Moreover, pruning finds efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy. We hope our results will invigorate further research into the performance-size trade-offs of modern continuous-depth models. △ Less

Submitted 18 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021

arXiv:2106.08314 [pdf, other]

Causal Navigation by Continuous-time Neural Networks

Authors: Charles Vorbach, Ramin Hasani, Alexander Amini, Mathias Lechner, Daniela Rus

Abstract: Imitation learning enables high-fidelity, vision-based learning of policies within rich, photorealistic environments. However, such techniques often rely on traditional discrete-time neural models and face difficulties in generalizing to domain shifts by failing to account for the causal relationships between the agent and the environment. In this paper, we propose a theoretical and experimental f… ▽ More Imitation learning enables high-fidelity, vision-based learning of policies within rich, photorealistic environments. However, such techniques often rely on traditional discrete-time neural models and face difficulties in generalizing to domain shifts by failing to account for the causal relationships between the agent and the environment. In this paper, we propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks, specifically over their discrete-time counterparts. We evaluate our method in the context of visual-control learning of drones over a series of complex tasks, ranging from short- and long-term navigation, to chasing static and dynamic objects through photorealistic environments. Our results demonstrate that causal continuous-time deep models can perform robust navigation tasks, where advanced recurrent models fail. These models learn complex causal control representations directly from raw visual inputs and scale to solve a variety of tasks using imitation learning. △ Less

Submitted 16 August, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: 24 Pages

arXiv:2106.07091 [pdf, other]

On-Off Center-Surround Receptive Fields for Accurate and Robust Image Classification

Authors: Zahra Babaiee, Ramin Hasani, Mathias Lechner, Daniela Rus, Radu Grosu

Abstract: Robustness to variations in lighting conditions is a key objective for any deep vision system. To this end, our paper extends the receptive field of convolutional neural networks with two residual components, ubiquitous in the visual processing system of vertebrates: On-center and off-center pathways, with excitatory center and inhibitory surround; OOCS for short. The on-center pathway is excited… ▽ More Robustness to variations in lighting conditions is a key objective for any deep vision system. To this end, our paper extends the receptive field of convolutional neural networks with two residual components, ubiquitous in the visual processing system of vertebrates: On-center and off-center pathways, with excitatory center and inhibitory surround; OOCS for short. The on-center pathway is excited by the presence of a light stimulus in its center but not in its surround, whereas the off-center one is excited by the absence of a light stimulus in its center but not in its surround. We design OOCS pathways via a difference of Gaussians, with their variance computed analytically from the size of the receptive fields. OOCS pathways complement each other in their response to light stimuli, ensuring this way a strong edge-detection capability, and as a result, an accurate and robust inference under challenging lighting conditions. We provide extensive empirical evidence showing that networks supplied with the OOCS edge representation gain accuracy and illumination-robustness compared to standard deep models. △ Less

Submitted 13 June, 2021; originally announced June 2021.

Comments: 21 Pages. Accepted for publication in the proceedings of the 38th International Conference on Machine Learning (ICML) 2021

arXiv:2103.08187 [pdf, other]

Adversarial Training is Not Ready for Robot Learning

Authors: Mathias Lechner, Ramin Hasani, Radu Grosu, Daniela Rus, Thomas A. Henzinger

Abstract: Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations, with the cost of nominal performance drop. While adversarial training appears to enhance the robustness and safety of a deep model deployed in open-world decision-critical applications, counterintuitively, it induces undesired behaviors in robot learning settings. In this pap… ▽ More Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations, with the cost of nominal performance drop. While adversarial training appears to enhance the robustness and safety of a deep model deployed in open-world decision-critical applications, counterintuitively, it induces undesired behaviors in robot learning settings. In this paper, we show theoretically and experimentally that neural controllers obtained via adversarial training are subjected to three types of defects, namely transient, systematic, and conditional errors. We first generalize adversarial training to a safety-domain optimization scheme allowing for more generic specifications. We then prove that such a learning process tends to cause certain error profiles. We support our theoretical results by a thorough experimental safety analysis in a robot-learning task. Our results suggest that adversarial training is not yet ready for robot learning. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: Accepted at the IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2103.04909 [pdf, other]

Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing

Authors: Axel Brunnbauer, Luigi Berducci, Andreas Brandstätter, Mathias Lechner, Ramin Hasani, Daniela Rus, Radu Grosu

Abstract: World models learn behaviors in a latent imagination space to enhance the sample-efficiency of deep reinforcement learning (RL) algorithms. While learning world models for high-dimensional observations (e.g., pixel inputs) has become practicable on standard RL benchmarks and some games, their effectiveness in real-world robotics applications has not been explored. In this paper, we investigate how… ▽ More World models learn behaviors in a latent imagination space to enhance the sample-efficiency of deep reinforcement learning (RL) algorithms. While learning world models for high-dimensional observations (e.g., pixel inputs) has become practicable on standard RL benchmarks and some games, their effectiveness in real-world robotics applications has not been explored. In this paper, we investigate how such agents generalize to real-world autonomous vehicle control tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with a high-dimensional LiDAR sensor, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the choice of their observation model. We provide extensive empirical evidence for the effectiveness of world models provided with long enough memory horizons in sim2real tasks. △ Less

Submitted 28 February, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: This paper is accepted for presentation at the International Conference on Robotics and Automation (ICRA), 2022

arXiv:2012.08863 [pdf, other]

On The Verification of Neural ODEs with Stochastic Guarantees

Authors: Sophie Gruenbacher, Ramin Hasani, Mathias Lechner, Jacek Cyranka, Scott A. Smolka, Radu Grosu

Abstract: We show that Neural ODEs, an emerging class of time-continuous neural networks, can be verified by solving a set of global-optimization problems. For this purpose, we introduce Stochastic Lagrangian Reachability (SLR), an abstraction-based technique for constructing a tight Reachtube (an over-approximation of the set of reachable states over a given time-horizon), and provide stochastic guarantees… ▽ More We show that Neural ODEs, an emerging class of time-continuous neural networks, can be verified by solving a set of global-optimization problems. For this purpose, we introduce Stochastic Lagrangian Reachability (SLR), an abstraction-based technique for constructing a tight Reachtube (an over-approximation of the set of reachable states over a given time-horizon), and provide stochastic guarantees in the form of confidence intervals for the Reachtube bounds. SLR inherently avoids the infamous wrap** effect (accumulation of over-approximation errors) by performing local optimization steps to expand safe regions instead of repeatedly forward-propagating them as is done by deterministic reachability methods. To enable fast local optimizations, we introduce a novel forward-mode adjoint sensitivity method to compute gradients without the need for backpropagation. Finally, we establish asymptotic and non-asymptotic convergence rates for SLR. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 12 pages, 2 figures

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 2021, pages 11525-11535

arXiv:2006.04439 [pdf, other]

Liquid Time-constant Networks

Authors: Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, Radu Grosu

Abstract: We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. The resulting models represent dynamical systems with varying (i.e., liquid) time-constants coupled to their hidden state, with outputs bein… ▽ More We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems modulated via nonlinear interlinked gates. The resulting models represent dynamical systems with varying (i.e., liquid) time-constants coupled to their hidden state, with outputs being computed by numerical differential equation solvers. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations, and give rise to improved performance on time-series prediction tasks. To demonstrate these properties, we first take a theoretical approach to find bounds over their dynamics and compute their expressive power by the trajectory length measure in latent trajectory space. We then conduct a series of time-series prediction experiments to manifest the approximation capability of Liquid Time-Constant Networks (LTCs) compared to classical and modern RNNs. Code and data are available at https://github.com/raminmh/liquid_time_constant_networks △ Less

Submitted 14 December, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Accepted to the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

arXiv:2006.04418 [pdf, other]

Learning Long-Term Dependencies in Irregularly-Sampled Time Series

Authors: Mathias Lechner, Ramin Hasani

Abstract: Recurrent neural networks (RNNs) with continuous-time hidden states are a natural fit for modeling irregularly-sampled time series. These models, however, face difficulties when the input data possess long-term dependencies. We prove that similar to standard RNNs, the underlying reason for this issue is the vanishing or exploding of the gradient during training. This phenomenon is expressed by the… ▽ More Recurrent neural networks (RNNs) with continuous-time hidden states are a natural fit for modeling irregularly-sampled time series. These models, however, face difficulties when the input data possess long-term dependencies. We prove that similar to standard RNNs, the underlying reason for this issue is the vanishing or exploding of the gradient during training. This phenomenon is expressed by the ordinary differential equation (ODE) representation of the hidden state, regardless of the ODE solver's choice. We provide a solution by designing a new algorithm based on the long short-term memory (LSTM) that separates its memory from its time-continuous state. This way, we encode a continuous-time dynamical flow within the RNN, allowing it to respond to inputs arriving at arbitrary time-lags while ensuring a constant error propagation through the memory path. We call these RNN models ODE-LSTMs. We experimentally show that ODE-LSTMs outperform advanced RNN-based counterparts on non-uniformly sampled data with long-term dependencies. All code and data is available at https://github.com/mlech26l/ode-lstms. △ Less

Submitted 4 December, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

arXiv:1811.00321 [pdf, ps, other]

Liquid Time-constant Recurrent Neural Networks as Universal Approximators

Authors: Ramin M. Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, Radu Grosu

Abstract: In this paper, we introduce the notion of liquid time-constant (LTC) recurrent neural networks (RNN)s, a subclass of continuous-time RNNs, with varying neuronal time-constant realized by their nonlinear synaptic transmission model. This feature is inspired by the communication principles in the nervous system of small species. It enables the model to approximate continuous map** with a small num… ▽ More In this paper, we introduce the notion of liquid time-constant (LTC) recurrent neural networks (RNN)s, a subclass of continuous-time RNNs, with varying neuronal time-constant realized by their nonlinear synaptic transmission model. This feature is inspired by the communication principles in the nervous system of small species. It enables the model to approximate continuous map** with a small number of computational units. We show that any finite trajectory of an $n$-dimensional continuous dynamical system can be approximated by the internal state of the hidden units and $n$ output units of an LTC network. Here, we also theoretically find bounds on their neuronal states and varying time-constant. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: This short report introduces the universal approximation capabilities of liquid time-constant (LTC) recurrent neural networks, and provides theoretical bounds for its dynamics

arXiv:1809.04423 [pdf, other]

Can a Compact Neuronal Circuit Policy be Re-purposed to Learn Simple Robotic Control?

Authors: Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, Radu Grosu

Abstract: We propose a neural information processing system which is obtained by re-purposing the function of a biological neural circuit model, to govern simulated and real-world control tasks. Inspired by the structure of the nervous system of the soil-worm, C. elegans, we introduce Neuronal Circuit Policies (NCPs), defined as the model of biological neural circuits reparameterized for the control of an a… ▽ More We propose a neural information processing system which is obtained by re-purposing the function of a biological neural circuit model, to govern simulated and real-world control tasks. Inspired by the structure of the nervous system of the soil-worm, C. elegans, we introduce Neuronal Circuit Policies (NCPs), defined as the model of biological neural circuits reparameterized for the control of an alternative task. We learn instances of NCPs to control a series of robotic tasks, including the autonomous parking of a real-world rover robot. For reconfiguration of the purpose of the neural circuit, we adopt a search-based optimization algorithm. Neuronal circuit policies perform on par and in some cases surpass the performance of contemporary deep learning models with the advantage leveraging significantly fewer learnable parameters and realizing interpretable dynamics at the cell-level. △ Less

Submitted 16 November, 2019; v1 submitted 11 September, 2018; originally announced September 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1803.08554

arXiv:1809.03864 [pdf, other]

Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks

Authors: Ramin M. Hasani, Alexander Amini, Mathias Lechner, Felix Naser, Radu Grosu, Daniela Rus

Abstract: In this paper, we introduce a novel method to interpret recurrent neural networks (RNNs), particularly long short-term memory networks (LSTMs) at the cellular level. We propose a systematic pipeline for interpreting individual hidden state dynamics within the network using response characterization methods. The ranked contribution of individual cells to the network's output is computed by analyzin… ▽ More In this paper, we introduce a novel method to interpret recurrent neural networks (RNNs), particularly long short-term memory networks (LSTMs) at the cellular level. We propose a systematic pipeline for interpreting individual hidden state dynamics within the network using response characterization methods. The ranked contribution of individual cells to the network's output is computed by analyzing a set of interpretable metrics of their decoupled step and sinusoidal responses. As a result, our method is able to uniquely identify neurons with insightful dynamics, quantify relationships between dynamical properties and test accuracy through ablation analysis, and interpret the impact of network capacity on a network's dynamical distribution. Finally, we demonstrate generalizability and scalability of our method by evaluating a series of different benchmark sequential datasets. △ Less

Submitted 11 September, 2018; originally announced September 2018.

arXiv:1803.08554 [pdf, other]

Neuronal Circuit Policies

Authors: Mathias Lechner, Ramin M. Hasani, Radu Grosu

Abstract: We propose an effective way to create interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. We model the tap-withdrawal (TW) neural circuit of the nematode, C. elegans, a circuit responsible for the worm's reflexive response to external mechanical touch stimulations, and learn its s… ▽ More We propose an effective way to create interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. We model the tap-withdrawal (TW) neural circuit of the nematode, C. elegans, a circuit responsible for the worm's reflexive response to external mechanical touch stimulations, and learn its synaptic and neuronal parameters as a policy for controlling basic RL tasks. We also autonomously park a real rover robot on a pre-defined trajectory, by deploying such neuronal circuit policies learned in a simulated environment. For reconfiguration of the purpose of the TW neural circuit, we adopt a search-based RL algorithm. We show that our neuronal policies perform as good as deep neural network policies with the advantage of realizing interpretable dynamics at the cell level. △ Less

Submitted 22 March, 2018; originally announced March 2018.

arXiv:1711.03467 [pdf, other]

Worm-level Control through Search-based Reinforcement Learning

Authors: Mathias Lechner, Radu Grosu, Ramin M. Hasani

Abstract: Through natural evolution, nervous systems of organisms formed near-optimal structures to express behavior. Here, we propose an effective way to create control agents, by \textit{re-purposing} the function of biological neural circuit models, to govern similar real world applications. We model the tap-withdrawal (TW) neural circuit of the nematode, \textit{C. elegans}, a circuit responsible for th… ▽ More Through natural evolution, nervous systems of organisms formed near-optimal structures to express behavior. Here, we propose an effective way to create control agents, by \textit{re-purposing} the function of biological neural circuit models, to govern similar real world applications. We model the tap-withdrawal (TW) neural circuit of the nematode, \textit{C. elegans}, a circuit responsible for the worm's reflexive response to external mechanical touch stimulations, and learn its synaptic and neural parameters as a policy for controlling the inverted pendulum problem. For reconfiguration of the purpose of the TW neural circuit, we manipulate a search-based reinforcement learning. We show that our neural policy performs as good as existing traditional control theory and machine learning approaches. A video demonstration of the performance of our method can be accessed at \url{https://youtu.be/o-Ia5IVyff8}. △ Less

Submitted 9 November, 2017; originally announced November 2017.

arXiv:1711.01436 [pdf, other]

Searching for Biophysically Realistic Parameters for Dynamic Neuron Models by Genetic Algorithms from Calcium Imaging Recording

Authors: Magdalena Fuchs, Manuel Zimmer, Radu Grosu, Ramin M. Hasani

Abstract: Individual Neurons in the nervous systems exploit various dynamics. To capture these dynamics for single neurons, we tune the parameters of an electrophysiological model of nerve cells, to fit experimental data obtained by calcium imaging. A search for the biophysical parameters of this model is performed by means of a genetic algorithm, where the model neuron is exposed to a predefined input curr… ▽ More Individual Neurons in the nervous systems exploit various dynamics. To capture these dynamics for single neurons, we tune the parameters of an electrophysiological model of nerve cells, to fit experimental data obtained by calcium imaging. A search for the biophysical parameters of this model is performed by means of a genetic algorithm, where the model neuron is exposed to a predefined input current representing overall inputs from other parts of the nervous system. The algorithm is then constrained for kee** the ion-channel currents within reasonable ranges, while producing the best fit to a calcium imaging time series of the AVA interneuron, from the brain of the soil-worm, C. elegans. Our settings enable us to project a set of biophysical parameters to the the neuron kinetics observed in neuronal imaging. △ Less

Submitted 4 November, 2017; originally announced November 2017.

arXiv:1703.06272 [pdf, other]

An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings

Authors: Ramin M. Hasani, Guodong Wang, Radu Grosu

Abstract: This paper studies an intelligent ultimate technique for health-monitoring and prognostic of common rotary machine components, particularly bearings. During a run-to-failure experiment, rich unsupervised features from vibration sensory data are extracted by a trained sparse auto-encoder. Then, the correlation of the extracted attributes of the initial samples (presumably healthy at the beginning o… ▽ More This paper studies an intelligent ultimate technique for health-monitoring and prognostic of common rotary machine components, particularly bearings. During a run-to-failure experiment, rich unsupervised features from vibration sensory data are extracted by a trained sparse auto-encoder. Then, the correlation of the extracted attributes of the initial samples (presumably healthy at the beginning of the test) with the succeeding samples is calculated and passed through a moving-average filter. The normalized output is named auto-encoder correlation-based (AEC) rate which stands for an informative attribute of the system depicting its health status and precisely identifying the degradation starting point. We show that AEC technique well-generalizes in several run-to-failure tests. AEC collects rich unsupervised features form the vibration data fully autonomous. We demonstrate the superiority of the AEC over many other state-of-the-art approaches for the health monitoring and prognostic of machine bearings. △ Less

Submitted 18 March, 2017; originally announced March 2017.

arXiv:1703.06270 [pdf, other]

SIM-CE: An Advanced Simulink Platform for Studying the Brain of Caenorhabditis elegans

Authors: Ramin M. Hasani, Victoria Beneder, Magdalena Fuchs, David Lung, Radu Grosu

Abstract: We introduce SIM-CE, an advanced, user-friendly modeling and simulation environment in Simulink for performing multi-scale behavioral analysis of the nervous system of Caenorhabditis elegans (C. elegans). SIM-CE contains an implementation of the mathematical models of C. elegans's neurons and synapses, in Simulink, which can be easily extended and particularized by the user. The Simulink model is… ▽ More We introduce SIM-CE, an advanced, user-friendly modeling and simulation environment in Simulink for performing multi-scale behavioral analysis of the nervous system of Caenorhabditis elegans (C. elegans). SIM-CE contains an implementation of the mathematical models of C. elegans's neurons and synapses, in Simulink, which can be easily extended and particularized by the user. The Simulink model is able to capture both complex dynamics of ion channels and additional biophysical detail such as intracellular calcium concentration. We demonstrate the performance of SIM-CE by carrying out neuronal, synaptic and neural-circuit-level behavioral simulations. Such environment enables the user to capture unknown properties of the neural circuits, test hypotheses and determine the origin of many behavioral plasticities exhibited by the worm. △ Less

Submitted 25 March, 2017; v1 submitted 18 March, 2017; originally announced March 2017.

arXiv:1703.06264 [pdf, other]

Non-Associative Learning Representation in the Nervous System of the Nematode Caenorhabditis elegans

Authors: Ramin M. Hasani, Magdalena Fuchs, Victoria Beneder, Radu Grosu

Abstract: Caenorhabditis elegans (C. elegans) illustrated remarkable behavioral plasticities including complex non-associative and associative learning representations. Understanding the principles of such mechanisms presumably leads to constructive inspirations for the design of efficient learning algorithms. In the present study, we postulate a novel approach on modeling single neurons and synapses to stu… ▽ More Caenorhabditis elegans (C. elegans) illustrated remarkable behavioral plasticities including complex non-associative and associative learning representations. Understanding the principles of such mechanisms presumably leads to constructive inspirations for the design of efficient learning algorithms. In the present study, we postulate a novel approach on modeling single neurons and synapses to study the mechanisms underlying learning in the C. elegans nervous system. In this regard, we construct a precise mathematical model of sensory neurons where we include multi-scale details from genes, ion channels and ion pumps, together with a dynamic model of synapses comprised of neurotransmitters and receptors kinetics. We recapitulate mechanosensory habituation mechanism, a non-associative learning process, in which elements of the neural network tune their parameters as a result of repeated input stimuli. Accordingly, we quantitatively demonstrate the roots of such plasticity in the neuronal and synaptic-level representations. Our findings can potentially give rise to the development of new bio-inspired learning algorithms. △ Less

Submitted 25 March, 2017; v1 submitted 18 March, 2017; originally announced March 2017.

arXiv:1702.07426 [pdf, other]

Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits

Authors: Ramin M. Hasani, Giorgio Ferrari, Hideaki Yamamoto, Sho Kono, Koji Ishihara, Soya Fujimori, Takashi Tanii, Enrico Prati

Abstract: There are several indications that brain is organized not on a basis of individual unreliable neurons, but on a micro-circuital scale providing Lego blocks employed to create complex architectures. At such an intermediate scale, the firing activity in the microcircuits is governed by collective effects emerging by the background noise soliciting spontaneous firing, the degree of mutual connections… ▽ More There are several indications that brain is organized not on a basis of individual unreliable neurons, but on a micro-circuital scale providing Lego blocks employed to create complex architectures. At such an intermediate scale, the firing activity in the microcircuits is governed by collective effects emerging by the background noise soliciting spontaneous firing, the degree of mutual connections between the neurons, and the topology of the connections. We compare spontaneous firing activity of small populations of neurons adhering to an engineered scaffold with simulations of biologically plausible CMOS artificial neuron populations whose spontaneous activity is ignited by tailored background noise. We provide a full set of flexible and low-power consuming silicon blocks including neurons, excitatory and inhibitory synapses, and both white and pink noise generators for spontaneous firing activation. We achieve a comparable degree of correlation of the firing activity of the biological neurons by controlling the kind and the number of connection among the silicon neurons. The correlation between groups of neurons, organized as a ring of four distinct populations connected by the equivalent of interneurons, is triggered more effectively by adding multiple synapses to the connections than increasing the number of independent point-to-point connections. The comparison between the biological and the artificial systems suggests that a considerable number of synapses is active also in biological populations adhering to engineered scaffolds. △ Less

Submitted 23 February, 2017; originally announced February 2017.

Showing 1–44 of 44 results for author: Hasani, R