Search | arXiv e-print repository

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

Authors: Amrutha Varshini Ramesh, Vignesh Ganapathiraman, Issam H. Laradji, Mark Schmidt

Abstract: Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA)… ▽ More Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA) add trainable low-rank matrix factorizations, altering the training dynamics and limiting the model's parameter search to a low-rank subspace. GaLore, a more recent method, employs Gradient Low-Rank Projection to reduce the memory footprint, in the full parameter training setting. However GaLore can only be applied to a subset of the LLM layers that satisfy the "reversibility" property, thus limiting their applicability. In response to these challenges, we introduce BlockLLM, an approach inspired by block coordinate descent. Our method carefully selects and updates a very small subset of the trainable parameters without altering any part of its architecture and training procedure. BlockLLM achieves state-of-the-art performance in both finetuning and pretraining tasks, while reducing the memory footprint of the underlying optimization process. Our experiments demonstrate that fine-tuning with only less than 5% of the parameters, BlockLLM achieves state-of-the-art perplexity scores on the GLUE benchmarks. On Llama model pretrained on C4 dataset, BlockLLM is able to train with significantly less memory than the state-of-the-art, while still maintaining competitive performance. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 16 pages, 7 figures

arXiv:2405.17030 [pdf]

SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving

Authors: Avinash Nittur Ramesh, Aitor Correas-Serrano, María González-Huici

Abstract: We present a novel synthetically generated multi-modal dataset, SCaRL, to enable the training and validation of autonomous driving solutions. Multi-modal datasets are essential to attain the robustness and high accuracy required by autonomous systems in applications such as autonomous driving. As deep learning-based solutions are becoming more prevalent for object detection, classification, and tr… ▽ More We present a novel synthetically generated multi-modal dataset, SCaRL, to enable the training and validation of autonomous driving solutions. Multi-modal datasets are essential to attain the robustness and high accuracy required by autonomous systems in applications such as autonomous driving. As deep learning-based solutions are becoming more prevalent for object detection, classification, and tracking tasks, there is great demand for datasets combining camera, lidar, and radar sensors. Existing real/synthetic datasets for autonomous driving lack synchronized data collection from a complete sensor suite. SCaRL provides synchronized Synthetic data from RGB, semantic/instance, and depth Cameras; Range-Doppler-Azimuth/Elevation maps and raw data from Radar; and 3D point clouds/2D maps of semantic, depth and Doppler data from coherent Lidar. SCaRL is a large dataset based on the CARLA Simulator, which provides data for diverse, dynamic scenarios and traffic conditions. SCaRL is the first dataset to include synthetic synchronized data from coherent Lidar and MIMO radar sensors. The dataset can be accessed here: https://fhr-ihs-sva.pages.fraunhofer.de/asp/scarl/ △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted in International Conference on Microwaves for Intelligent Mobility - 16.&17. April 2024 - Boppard near Koblenz, Germany

arXiv:2405.03878 [pdf, other]

Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

Authors: Aditya A. Ramesh, Kenny Young, Louis Kirsch, Jürgen Schmidhuber

Abstract: Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrap** to overcome variance but introduces a bias that can only be corrected through many iterations. TD($λ$) provid… ▽ More Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrap** to overcome variance but introduces a bias that can only be corrected through many iterations. TD($λ$) provides a mechanism to navigate this bias-variance tradeoff smoothly. Appropriately selecting $λ$ can significantly improve performance. Here, we propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing $λ$-return targets. Unlike other model-based solutions to credit assignment, Chunked-TD is less vulnerable to model inaccuracies. Our approach is motivated by the principle of history compression and 'chunks' trajectories for conventional TD learning. Chunking with learned world models compresses near-deterministic regions of the environment-policy interaction to speed up credit assignment while still bootstrap** when necessary. We propose algorithms that can be implemented online and show that they solve some problems much faster than conventional TD($λ$). △ Less

Submitted 4 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: ICML 2024 version

arXiv:2312.03858 [pdf, other]

Stop Hiding The Sharp Knives: The WebAssembly Linux Interface

Authors: Arjun Ramesh, Tianshu Huang, Ben L. Titzer, Anthony Rowe

Abstract: WebAssembly is gaining popularity as a portable binary format targetable from many programming languages. With a well-specified low-level virtual instruction set, minimal memory footprint and many high-performance implementations, it has been successfully adopted for lightweight in-process memory sandboxing in many contexts. Despite these advantages, WebAssembly lacks many standard system interfac… ▽ More WebAssembly is gaining popularity as a portable binary format targetable from many programming languages. With a well-specified low-level virtual instruction set, minimal memory footprint and many high-performance implementations, it has been successfully adopted for lightweight in-process memory sandboxing in many contexts. Despite these advantages, WebAssembly lacks many standard system interfaces, making it difficult to reuse existing applications. This paper proposes WALI: The WebAssembly Linux Interface, a thin layer over Linux's userspace system calls, creating a new class of virtualization where WebAssembly seamlessly interacts with native processes and the underlying operating system. By virtualizing the lowest level of userspace, WALI offers application portability with little effort and reuses existing compiler backends. With WebAssembly's control flow integrity guarantees, these modules gain an additional level of protection against remote code injection attacks. Furthermore, capability-based APIs can themselves be virtualized and implemented in terms of WALI, improving reuse and robustness through better layering. We present an implementation of WALI in a modern WebAssembly engine and evaluate its performance on a number of applications which we can now compile with mostly trivial effort. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 12 pages, 8 figures

arXiv:2309.11820 [pdf, other]

Automatic Endoscopic Ultrasound Station Recognition with Limited Data

Authors: Abhijit Ramesh, Anantha Nandanan, Nikhil Boggavarapu, Priya Nair MD, Gilad Gressel

Abstract: Pancreatic cancer is a lethal form of cancer that significantly contributes to cancer-related deaths worldwide. Early detection is essential to improve patient prognosis and survival rates. Despite advances in medical imaging techniques, pancreatic cancer remains a challenging disease to detect. Endoscopic ultrasound (EUS) is the most effective diagnostic tool for detecting pancreatic cancer. Howe… ▽ More Pancreatic cancer is a lethal form of cancer that significantly contributes to cancer-related deaths worldwide. Early detection is essential to improve patient prognosis and survival rates. Despite advances in medical imaging techniques, pancreatic cancer remains a challenging disease to detect. Endoscopic ultrasound (EUS) is the most effective diagnostic tool for detecting pancreatic cancer. However, it requires expert interpretation of complex ultrasound images to complete a reliable patient scan. To obtain complete imaging of the pancreas, practitioners must learn to guide the endoscope into multiple "EUS stations" (anatomical locations), which provide different views of the pancreas. This is a difficult skill to learn, involving over 225 proctored procedures with the support of an experienced doctor. We build an AI-assisted tool that utilizes deep learning techniques to identify these stations of the stomach in real time during EUS procedures. This computer-assisted diagnostic (CAD) will help train doctors more efficiently. Historically, the challenge faced in develo** such a tool has been the amount of retrospective labeling required by trained clinicians. To solve this, we developed an open-source user-friendly labeling web app that streamlines the process of annotating stations during the EUS procedure with minimal effort from the clinicians. Our research shows that employing only 43 procedures with no hyperparameter fine-tuning obtained a balanced accuracy of 89%, comparable to the current state of the art. In addition, we employ Grad-CAM, a visualization technology that provides clinicians with interpretable and explainable visualizations. △ Less

Submitted 28 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.01169 [pdf, other]

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

Authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem di… ▽ More We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n^2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.10142 [pdf, other]

Enhancing Visual Domain Adaptation with Source Preparation

Authors: Anirudha Ramesh, Anurag Ghosh, Christoph Mertz, Jeff Schneider

Abstract: Robotic Perception in diverse domains such as low-light scenarios, where new modalities like thermal imaging and specialized night-vision sensors are increasingly employed, remains a challenge. Largely, this is due to the limited availability of labeled data. Existing Domain Adaptation (DA) techniques, while promising to leverage labels from existing well-lit RGB images, fail to consider the chara… ▽ More Robotic Perception in diverse domains such as low-light scenarios, where new modalities like thermal imaging and specialized night-vision sensors are increasingly employed, remains a challenge. Largely, this is due to the limited availability of labeled data. Existing Domain Adaptation (DA) techniques, while promising to leverage labels from existing well-lit RGB images, fail to consider the characteristics of the source domain itself. We holistically account for this factor by proposing Source Preparation (SP), a method to mitigate source domain biases. Our Almost Unsupervised Domain Adaptation (AUDA) framework, a label-efficient semi-supervised approach for robotic scenarios -- employs Source Preparation (SP), Unsupervised Domain Adaptation (UDA) and Supervised Alignment (SA) from limited labeled data. We introduce CityIntensified, a novel dataset comprising temporally aligned image pairs captured from a high-sensitivity camera and an intensifier camera for semantic segmentation and object detection in low-light settings. We demonstrate the effectiveness of our method in semantic segmentation, with experiments showing that SP enhances UDA across a range of visual domains, with improvements up to 40.64% in mIoU over baseline, while making target models more robust to real-world shifts within the target domain. We show that AUDA is a label-efficient framework for effective DA, significantly improving target domain performance with only tens of labeled samples from the target domain. △ Less

Submitted 16 June, 2023; originally announced June 2023.

ACM Class: I.4; I.5; I.2

arXiv:2306.01570 [pdf]

Spatio-Temporal Deep Learning-Assisted Reduced Security-Constrained Unit Commitment

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the c… ▽ More Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the complexity. In this paper, an advanced machine learning (ML) model is used to study the patterns in power system historical data, which inherently considers both spatial and temporal (ST) correlations in constraints. The ST-correlated ML model is trained to understand spatial correlation by considering graph neural networks (GNN) whereas temporal sequences are studied using long short-term memory (LSTM) networks. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, and synthetic South-Carolina (SC) 500-Bus system. Moreover, B-θ and power transfer distribution factor (PTDF) based SCUC formulations were considered in this research. Simulation results demonstrate that the ST approach can effectively predict generator commitment schedule and classify critical and non-critical lines in the system which are utilized for model reduction of SCUC to obtain computational enhancement without loss in solution quality △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 8 Figures, 5 Tables, 1 Algorithm

arXiv:2305.17066 [pdf, other]

Mindstorms in Natural Language-Based Societies of Mind

Authors: Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, **jie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-** Fan, Bernard Ghanem , et al. (1 additional authors not shown)

Abstract: Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco… ▽ More Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 9 pages in main text + 7 pages of references + 38 pages of appendices, 14 figures in main text + 13 in appendices, 7 tables in appendices

MSC Class: 68T07 ACM Class: I.2.6; I.2.11

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2303.06776 [pdf, other]

Robot Health Indicator: A Visual Cue to Improve Level of Autonomy Switching Systems

Authors: Aniketh Ramesh, Madeleine Englund, Andreas Theodorou, Rustam Stolkin, Manolis Chiou

Abstract: Using different Levels of Autonomy (LoA), a human operator can vary the extent of control they have over a robot's actions. LoAs enable operators to mitigate a robot's performance degradation or limitations in the its autonomous capabilities. However, LoA regulation and other tasks may often overload an operator's cognitive abilities. Inspired by video game user interfaces, we study if adding a 'R… ▽ More Using different Levels of Autonomy (LoA), a human operator can vary the extent of control they have over a robot's actions. LoAs enable operators to mitigate a robot's performance degradation or limitations in the its autonomous capabilities. However, LoA regulation and other tasks may often overload an operator's cognitive abilities. Inspired by video game user interfaces, we study if adding a 'Robot Health Bar' to the robot control UI can reduce the cognitive demand and perceptual effort required for LoA regulation while promoting trust and transparency. This Health Bar uses the robot vitals and robot health framework to quantify and present runtime performance degradation in robots. Results from our pilot study indicate that when using a health bar, operators used to manual control more to minimise the risk of robot failure during high performance degradation. It also gave us insights and lessons to inform subsequent experiments on human-robot teaming. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted for Variable Autonomy for human-robot Teaming (VAT) workshop at ACM/IEEE HRI 2023

ACM Class: I.2.9

arXiv:2212.02179 [pdf, other]

Physics-Informed Model-Based Reinforcement Learning

Authors: Adithya Ramesh, Balaraman Ravindran

Abstract: We apply reinforcement learning (RL) to robotics tasks. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, essentially its transition dynamics and reward function, use it to generate imaginary trajectories and backpropagate thr… ▽ More We apply reinforcement learning (RL) to robotics tasks. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, essentially its transition dynamics and reward function, use it to generate imaginary trajectories and backpropagate through them to update the policy, exploiting the differentiability of the model. Intuitively, learning more accurate models should lead to better model-based RL performance. Recently, there has been growing interest in develo** better deep neural network based dynamics models for physical systems, by utilizing the structure of the underlying physics. We focus on robotic systems undergoing rigid body motion without contacts. We compare two versions of our model-based RL algorithm, one which uses a standard deep neural network based dynamics model and the other which uses a much more accurate, physics-informed neural network based dynamics model. We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions, where numerical errors accumulate fast. In these environments, the physics-informed version of our algorithm achieves significantly better average-return and sample efficiency. In environments that are not sensitive to initial conditions, both versions of our algorithm achieve similar average-return, while the physics-informed version achieves better sample efficiency. We also show that, in challenging environments, physics-informed model-based RL achieves better average-return than state-of-the-art model-free RL algorithms such as Soft Actor-Critic, as it computes the policy-gradient analytically, while the latter estimates it through sampling. △ Less

Submitted 14 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.14095 [pdf, other]

A Hierarchical Variable Autonomy Mixed-Initiative Framework for Human-Robot Teaming in Mobile Robotics

Authors: Dimitris Panagopoulos, Giannis Petousakis, Aniketh Ramesh, Tianshu Ruan, Grigoris Nikolaou, Rustam Stolkin, Manolis Chiou

Abstract: This paper presents a Mixed-Initiative (MI) framework for addressing the problem of control authority transfer between a remote human operator and an AI agent when cooperatively controlling a mobile robot. Our Hierarchical Expert-guided Mixed-Initiative Control Switcher (HierEMICS) leverages information on the human operator's state and intent. The control switching policies are based on a critica… ▽ More This paper presents a Mixed-Initiative (MI) framework for addressing the problem of control authority transfer between a remote human operator and an AI agent when cooperatively controlling a mobile robot. Our Hierarchical Expert-guided Mixed-Initiative Control Switcher (HierEMICS) leverages information on the human operator's state and intent. The control switching policies are based on a criticality hierarchy. An experimental evaluation was conducted in a high-fidelity simulated disaster response and remote inspection scenario, comparing HierEMICS with a state-of-the-art Expert-guided Mixed-Initiative Control Switcher (EMICS) in the context of mobile robot navigation. Results suggest that HierEMICS reduces conflicts for control between the human and the AI agent, which is a fundamental challenge in both the MI control paradigm and also in the related shared control paradigm. Additionally, we provide statistically significant evidence of improved, navigational safety (i.e., fewer collisions), LOA switching efficiency, and conflict for control reduction. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 6 pages, 4 figures, ICHMS 2022, First two Authors contributed equally

arXiv:2211.10282 [pdf, other]

Exploring through Random Curiosity with General Value Functions

Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

Abstract: Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general… ▽ More Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general value functions (RC-GVF), a novel intrinsic reward function that draws upon connections between these distinct approaches. Instead of using only the current observation's novelty or a curiosity bonus for failing to predict precise environment dynamics, RC-GVF derives intrinsic rewards through predicting temporally extended general value functions. We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable MiniGrid environments. Panoramic observations on MiniGrid further boost RC-GVF's performance such that it is competitive to baselines exploiting privileged information in form of episodic counts. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2211.02222 [pdf, other]

The Benefits of Model-Based Generalization in Reinforcement Learning

Authors: Kenny Young, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

Abstract: Model-Based Reinforcement Learning (RL) is widely believed to have the potential to improve sample efficiency by allowing an agent to synthesize large amounts of imagined experience. Experience Replay (ER) can be considered a simple kind of model, which has proved effective at improving the stability and efficiency of deep RL. In principle, a learned parametric model could improve on ER by general… ▽ More Model-Based Reinforcement Learning (RL) is widely believed to have the potential to improve sample efficiency by allowing an agent to synthesize large amounts of imagined experience. Experience Replay (ER) can be considered a simple kind of model, which has proved effective at improving the stability and efficiency of deep RL. In principle, a learned parametric model could improve on ER by generalizing from real experience to augment the dataset with additional plausible experience. However, given that learned value functions can also generalize, it is not immediately obvious why model generalization should be better. Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful. First, we provide a simple theorem motivating how learning a model as an intermediate step can narrow down the set of possible value functions more than learning a value function directly from data using the Bellman equation. Second, we provide an illustrative example showing empirically how a similar effect occurs in a more concrete setting with neural network function approximation. Finally, we provide extensive experiments showing the benefit of model-based learning for online RL in environments with combinatorial complexity, but factored structure that allows a learned model to generalize. In these experiments, we take care to control for other factors in order to isolate, insofar as possible, the benefit of using experience generated by a learned model relative to ER alone. △ Less

Submitted 10 July, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: Update to ICML version

arXiv:2208.06742 [pdf]

Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existin… ▽ More Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existing patterns in historical information can be leveraged for model reduction of SCUC, which can provide significant time savings. In this paper, machine learning (ML) based classification approaches, namely logistic regression, neural networks, random forest and K-nearest neighbor, were studied for model reduction of SCUC. The ML was then aided with a feasibility layer (FL) and post-process technique to ensure high-quality solutions. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, 500-Bus system, and Polish 2383-Bus system. Moreover, model reduction of a stochastic SCUC (SSCUC) was demonstrated utilizing a modified IEEE 24-Bus system with renewable generation. Simulation results demonstrate a high training accuracy to identify commitment schedule while FL and post-process ensure ML predictions do not lead to infeasible solutions with minimal loss in solution quality. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: 10 pages, 9 figures, 8 tables

arXiv:2207.01684 [pdf, other]

Robot Vitals and Robot Health: Towards Systematically Quantifying Runtime Performance Degradation in Robots Under Adverse Conditions

Authors: Aniketh Ramesh, Rustam Stolkin, Manolis Chiou

Abstract: This paper addresses the problem of automatically detecting and quantifying performance degradation in remote mobile robots during task execution. A robot may encounter a variety of uncertainties and adversities during task execution, which can impair its ability to carry out tasks effectively and cause its performance to degrade. Such situations can be mitigated or averted by timely detection and… ▽ More This paper addresses the problem of automatically detecting and quantifying performance degradation in remote mobile robots during task execution. A robot may encounter a variety of uncertainties and adversities during task execution, which can impair its ability to carry out tasks effectively and cause its performance to degrade. Such situations can be mitigated or averted by timely detection and intervention (e.g., by a remote human supervisor taking over control in teleoperation mode). Inspired by patient triaging systems in hospitals, we introduce the framework of "robot vitals" for estimating overall "robot health". A robot's vitals are a set of indicators that estimate the extent of performance degradation faced by a robot at a given point in time. Robot health is a metric that combines robot vitals into a single scalar value estimate of performance degradation. Experiments, both in simulation and on a real mobile robot, demonstrate that the proposed robot vitals and robot health can be used effectively to estimate robot performance degradation during runtime. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: 8 Pages

MSC Class: 68T40

arXiv:2207.01570 [pdf, other]

Goal-Conditioned Generators of Deep Policies

Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a… ▽ More Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Preprint. Under Review

arXiv:2207.01566 [pdf, other]

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Authors: Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

Abstract: Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Netw… ▽ More Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Networks to learn a single value function for evaluating (and thus hel** to improve) any policy represented by a deep neural network (NN). The method yields competitive experimental results. In continuous control problems with infinitely many states, our value function minimizes its prediction error by simultaneously learning a small set of `probing states' and a map** from actions produced in probing states to the policy's return. The method extracts crucial abstract knowledge about the environment in form of very few states sufficient to fully specify the behavior of many policies. A policy improves solely by changing actions in probing states, following the gradient of the value function's predictions. Surprisingly, it is possible to clone the behavior of a near-optimal policy in Swimmer-v3 and Hopper-v3 environments only by knowing how to act in 3 and 5 such learned states, respectively. Remarkably, our value function trained to evaluate NN policies is also invariant to changes of the policy architecture: we show that it allows for zero-shot learning of linear policies competitive with the best policy seen during training. Our code is public. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Preprint. Under review

arXiv:2204.06125 [pdf, other]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Authors: Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen

Abstract: Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image repre… ▽ More Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2112.10741 [pdf, other]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Authors: Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen

Abstract: Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators f… ▽ More Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and weights at https://github.com/openai/glide-text2im. △ Less

Submitted 8 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: 20 pages, 18 figures

arXiv:2111.09824 [pdf]

Machine Learning Assisted Approach for Security-Constrained Unit Commitment

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Security-constrained unit commitment (SCUC) is solved for power system day-ahead generation scheduling, which is a large-scale mixed-integer linear programming problem and is very computationally intensive. Model reduction of SCUC may bring significant time savings. In this work, a novel approach is proposed to effectively utilize machine learning (ML) to reduce the problem size of SCUC. An ML mod… ▽ More Security-constrained unit commitment (SCUC) is solved for power system day-ahead generation scheduling, which is a large-scale mixed-integer linear programming problem and is very computationally intensive. Model reduction of SCUC may bring significant time savings. In this work, a novel approach is proposed to effectively utilize machine learning (ML) to reduce the problem size of SCUC. An ML model using logistic regression (LR) algorithm is proposed and trained with historical nodal demand profiles and the respective commitment schedules. The ML outputs are processed and analyzed to reduce variables and constraints in SCUC. The proposed approach is validated on several standard test systems including IEEE 24-bus system, IEEE 73-bus system, IEEE 118-bus system, synthetic South Carolina 500-bus system and Polish 2383-bus system. Simulation results demonstrate that the use of the prediction from the proposed LR model in SCUC model reduction can substantially reduce the computing time while maintaining solution quality. △ Less

Submitted 12 July, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 6 Pages, 5 Figures, 3 tables, 1 algorithm

arXiv:2110.05448 [pdf, other]

Unsupervised Neural Machine Translation with Generative Language Models Only

Authors: Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh, Alec Radford, Ilya Sutskever

Abstract: We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these… ▽ More We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these zero-shot translations by using them as few-shot demonstrations for sampling a larger synthetic dataset. This dataset is distilled by discarding the few-shot demonstrations and then fine-tuning. During backtranslation, we repeatedly generate translations for a set of inputs and then fine-tune a single language model on both directions of the translation task at once, ensuring cycle-consistency by swap** the roles of gold monotext and generated translations when fine-tuning. By using our method to leverage GPT-3's zero-shot translation capability, we achieve a new state-of-the-art in unsupervised translation on the WMT14 English-French benchmark, attaining a BLEU score of 42.1. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 10 pages

arXiv:2103.13321 [pdf]

Network Reconfiguration Impact on Renewable Energy System and Energy Storage System in Day-Ahead Scheduling

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Renewable energy sources (RES) has gained significant interest in recent years. However, due to favourable weather conditions, the RES is installed in remote locations with limited transmission capacity. As a result, it can lead to major curtailments of the free resource when the network is congested. Therefore, energy storage system (ESS) is considered as a viable solution to store energy and add… ▽ More Renewable energy sources (RES) has gained significant interest in recent years. However, due to favourable weather conditions, the RES is installed in remote locations with limited transmission capacity. As a result, it can lead to major curtailments of the free resource when the network is congested. Therefore, energy storage system (ESS) is considered as a viable solution to store energy and address the intermittent nature of RES though ESS is often distributed and may not be geographically close to RES. Therefore, ESS may also suffer from limited transmission capacity due to network congestion. Currently, grid operators overlook network flexibility as a congestion management tool in day-ahead scheduling. This paper addresses these issues and studies the benefits of introducing network reconfiguration (NR) as a preventive and corrective action for transmission flexibility in day-ahead stochastic security-constrained unit-commitment (SSCUC-PC) while considering a multi-scenario RES output. Simulation results demonstrate that NR can lower total system cost, reduce RES curtailments and utilize ESS for better impact by alleviating network congestion in both base-case and post-contingency networks. △ Less

Submitted 11 January, 2021; originally announced March 2021.

arXiv:2103.00020 [pdf, other]

Learning Transferable Visual Models From Natural Language Supervision

Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstr… ▽ More State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP. △ Less

Submitted 26 February, 2021; originally announced March 2021.

arXiv:2102.12092 [pdf, other]

Zero-Shot Text-to-Image Generation

Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion. △ Less

Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

arXiv:2011.07613 [pdf, other]

BirdSLAM: Monocular Multibody SLAM in Bird's-Eye View

Authors: Swapnil Daga, Gokul B. Nair, Anirudha Ramesh, Rahul Sajnani, Junaid Ahmed Ansari, K. Madhava Krishna

Abstract: In this paper, we present BirdSLAM, a novel simultaneous localization and map** (SLAM) system for the challenging scenario of autonomous driving platforms equipped with only a monocular camera. BirdSLAM tackles challenges faced by other monocular SLAM systems (such as scale ambiguity in monocular reconstruction, dynamic object localization, and uncertainty in feature representation) by using an… ▽ More In this paper, we present BirdSLAM, a novel simultaneous localization and map** (SLAM) system for the challenging scenario of autonomous driving platforms equipped with only a monocular camera. BirdSLAM tackles challenges faced by other monocular SLAM systems (such as scale ambiguity in monocular reconstruction, dynamic object localization, and uncertainty in feature representation) by using an orthographic (bird's-eye) view as the configuration space in which localization and map** are performed. By assuming only the height of the ego-camera above the ground, BirdSLAM leverages single-view metrology cues to accurately localize the ego-vehicle and all other traffic participants in bird's-eye view. We demonstrate that our system outperforms prior work that uses strictly greater information, and highlight the relevance of each design decision via an ablation analysis. △ Less

Submitted 15 November, 2020; originally announced November 2020.

Comments: Accepted in VISIGRAPP (VISAPP) 2021

arXiv:2010.15674 [pdf, other]

Analyzing Societal Impact of COVID-19: A Study During the Early Days of the Pandemic

Authors: Swaroop Gowdra Shanthakumar, Anand Seetharam, Arti Ramesh

Abstract: In this paper, we collect and study Twitter communications to understand the societal impact of COVID-19 in the United States during the early days of the pandemic. With infections soaring rapidly, users took to Twitter asking people to self isolate and quarantine themselves. Users also demanded closure of schools, bars, and restaurants as well as lockdown of cities and states. We methodically col… ▽ More In this paper, we collect and study Twitter communications to understand the societal impact of COVID-19 in the United States during the early days of the pandemic. With infections soaring rapidly, users took to Twitter asking people to self isolate and quarantine themselves. Users also demanded closure of schools, bars, and restaurants as well as lockdown of cities and states. We methodically collect tweets by identifying and tracking trending COVID-related hashtags. We first manually group the hashtags into six main categories, namely, 1) General COVID, 2) Quarantine, 3) Panic Buying, 4) School Closures, 5) Lockdowns, and 6) Frustration and Hope}, and study the temporal evolution of tweets in these hashtags. We conduct a linguistic analysis of words common to all hashtag groups and specific to each hashtag group and identify the chief concerns of people as the pandemic gripped the nation (e.g., exploring bidets as an alternative to toilet paper). We conduct sentiment analysis and our investigation reveals that people reacted positively to school closures and negatively to the lack of availability of essential goods due to panic buying. We adopt a state-of-the-art semantic role labeling approach to identify the action words and then leverage a LSTM-based dependency parsing model to analyze the context of action words (e.g., verb deal is accompanied by nouns such as anxiety, stress, and crisis). Finally, we develop a scalable seeded topic modeling approach to automatically categorize and isolate tweets into hashtag groups and experimentally validate that our topic model provides a grou** similar to our manual grou**. Our study presents a systematic way to construct an aggregated picture of peoples' response to the pandemic and lays the groundwork for future fine-grained linguistic and behavioral analysis. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: Accepted for publication in IEEE SocialCom 2020. arXiv admin note: substantial text overlap with arXiv:2004.05451

arXiv:2010.14701 [pdf, other]

Scaling Laws for Autoregressive Generative Modeling

Authors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depe… ▽ More We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks. △ Less

Submitted 5 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

Comments: 20+17 pages, 33 figures; added appendix with additional language results

arXiv:2010.14558 [pdf, other]

Characterizing Human Mobility Patterns During COVID-19 using Cellular Network Data

Authors: Necati A. Ayan, Nilson L. Damasceno, Sushil Chaskar, Peron R. de Sousa, Arti Ramesh, Anand Seetharam, Antonio A. de A. Rocha

Abstract: In this paper, our goal is to analyze and compare cellular network usage data from pre-lockdown, during lockdown, and post-lockdown phases surrounding the COVID-19 pandemic to understand and model human mobility patterns during the pandemic, and evaluate the effect of lockdowns on mobility. To this end, we collaborate with one of the main cellular network providers in Brazil, and collect and analy… ▽ More In this paper, our goal is to analyze and compare cellular network usage data from pre-lockdown, during lockdown, and post-lockdown phases surrounding the COVID-19 pandemic to understand and model human mobility patterns during the pandemic, and evaluate the effect of lockdowns on mobility. To this end, we collaborate with one of the main cellular network providers in Brazil, and collect and analyze cellular network connections from 1400 antennas for all users in the city of Rio de Janeiro and its suburbs from March 1, 2020 to July 1, 2020. Our analysis reveals that the total number of cellular connections decreases to 78% during the lockdown phase and then increases to 85% of the pre-COVID era as the lockdown eases. We observe that as more people work remotely, there is a shift in the antennas incurring top 10% of the total traffic, with the number of connections made to antennas in downtown Rio reducing drastically and antennas at other locations taking their place. We also observe that while nearly 40-45% users connected to only 1 antenna each day during the lockdown phase indicating no mobility, there are around 4% users (i.e., 80K users) who connected to more than 10 antennas, indicating very high mobility. Finally, we design an interactive tool that showcases mobility patterns in different granularities that can potentially help people and government officials understand the mobility of individuals and the number of COVID cases in a particular neighborhood. Our analysis, inferences, and interactive showcasing of mobility patterns based on large-scale data can be extrapolated to other cities of the world and has the potential to help in designing more effective pandemic management measures in the future. △ Less

Submitted 27 October, 2020; originally announced October 2020.

Comments: 12 pages

arXiv:2010.02013 [pdf]

Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)

Authors: Konstantinos, Katsiapis, Abhijit Karmarkar, Ahmet Altay, Aleksandr Zaks, Neoklis Polyzotis, Anusha Ramesh, Ben Mathes, Gautam Vasudevan, Irene Giannoumis, Jarek Wilkiewicz, Jiri Simsa, Justin Hong, Mitch Trott, Noé Lutz, Pavel A. Dournov, Robert Crowe, Sarah Sirajuddin, Tris Brian Warkentin, Zhitao Li

Abstract: Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering was an eventuality. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more a… ▽ More Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering was an eventuality. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more and more for research, experimentation and production workloads. ML now commonly powers widely-used products integral to our lives. But ML Engineering, as a discipline, has not widely matured as much as its Software Engineering ancestor. Can we take what we have learned and help the nascent field of applied ML evolve into ML Engineering the way Programming evolved into Software Engineering [1]? In this article we will give a whirlwind tour of Sibyl [2] and TensorFlow Extended (TFX) [3], two successive end-to-end (E2E) ML platforms at Alphabet. We will share the lessons learned from over a decade of applied ML built on these platforms, explain both their similarities and their differences, and expand on the shifts (both mental and technical) that helped us on our journey. In addition, we will highlight some of the capabilities of TFX that help realize several aspects of ML Engineering. We argue that in order to unlock the gains ML can bring, organizations should advance the maturity of their ML teams by investing in robust ML infrastructure and promoting ML Engineering education. We also recommend that before focusing on cutting-edge ML modeling techniques, product leaders should invest more time in adopting interoperable ML platforms for their organizations. In closing, we will also share a glimpse into the future of TFX. △ Less

Submitted 7 October, 2020; v1 submitted 28 September, 2020; originally announced October 2020.

Comments: 16 pages

arXiv:2007.04750 [pdf, other]

doi 10.1162/neco_a_01539

Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Authors: Aditya Ramesh, Paulo Rauber, Michelangelo Conserva, Jürgen Schmidhuber

Abstract: An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical… ▽ More An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work. △ Less

Submitted 3 November, 2023; v1 submitted 9 July, 2020; originally announced July 2020.

Journal ref: Neural Computation. 2022 Oct 7;34(11):2232-72

arXiv:2006.16621 [pdf, other]

A Simple Domain Shifting Networkfor Generating Low Quality Images

Authors: Guruprasad Hegde, Avinash Nittur Ramesh, Kanchana Vaishnavi Gandikota, Roman Obermaisser, Michael Moeller

Abstract: Deep Learning systems have proven to be extremely successful for image recognition tasks for which significant amounts of training data is available, e.g., on the famous ImageNet dataset. We demonstrate that for robotics applications with cheap camera equipment, the low image quality, however,influences the classification accuracy, and freely available databases cannot be exploited in a straight f… ▽ More Deep Learning systems have proven to be extremely successful for image recognition tasks for which significant amounts of training data is available, e.g., on the famous ImageNet dataset. We demonstrate that for robotics applications with cheap camera equipment, the low image quality, however,influences the classification accuracy, and freely available databases cannot be exploited in a straight forward way to train classifiers to be used on a robot. As a solution we propose to train a network on degrading the quality images in order to mimic specific low quality imaging systems. Numerical experiments demonstrate that classification networks trained by using images produced by our quality degrading network along with the high quality images outperform classification networks trained only on high quality data when used on a real robot system, while being significantly easier to use than competing zero-shot domain adaptation techniques. △ Less

Submitted 30 June, 2020; originally announced June 2020.

Comments: accepted ICPR 2020

arXiv:2006.08003 [pdf, other]

CompressNet: Generative Compression at Extremely Low Bitrates

Authors: Suraj Kiran Raman, Aditya Ramesh, Vijayakrishna Naganoor, Shubham Dash, Giridharan Kumaravelu, Honglak Lee

Abstract: Compressing images at extremely low bitrates (< 0.1 bpp) has always been a challenging task since the quality of reconstruction significantly reduces due to the strong imposed constraint on the number of bits allocated for the compressed data. With the increasing need to transfer large amounts of images with limited bandwidth, compressing images to very low sizes is a crucial task. However, the ex… ▽ More Compressing images at extremely low bitrates (< 0.1 bpp) has always been a challenging task since the quality of reconstruction significantly reduces due to the strong imposed constraint on the number of bits allocated for the compressed data. With the increasing need to transfer large amounts of images with limited bandwidth, compressing images to very low sizes is a crucial task. However, the existing methods are not effective at extremely low bitrates. To address this need, we propose a novel network called CompressNet which augments a Stacked Autoencoder with a Switch Prediction Network (SAE-SPN). This helps in the reconstruction of visually pleasing images at these low bitrates (< 0.1 bpp). We benchmark the performance of our proposed method on the Cityscapes dataset, evaluating over different metrics at extremely low bitrates to show that our method outperforms the other state-of-the-art. In particular, at a bitrate of 0.07, CompressNet achieves 22% lower Perceptual Loss and 55% lower Frechet Inception Distance (FID) compared to the deep learning SOTA methods. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2006.00305 [pdf, other]

RelEx: A Model-Agnostic Relational Model Explainer

Authors: Yue Zhang, David Defazio, Arti Ramesh

Abstract: In recent years, considerable progress has been made on improving the interpretability of machine learning models. This is essential, as complex deep learning models with millions of parameters produce state of the art results, but it can be nearly impossible to explain their predictions. While various explainability techniques have achieved impressive results, nearly all of them assume each data… ▽ More In recent years, considerable progress has been made on improving the interpretability of machine learning models. This is essential, as complex deep learning models with millions of parameters produce state of the art results, but it can be nearly impossible to explain their predictions. While various explainability techniques have achieved impressive results, nearly all of them assume each data instance to be independent and identically distributed (iid). This excludes relational models, such as Statistical Relational Learning (SRL), and the recently popular Graph Neural Networks (GNNs), resulting in few options to explain them. While there does exist one work on explaining GNNs, GNN-Explainer, they assume access to the gradients of the model to learn explanations, which is restrictive in terms of its applicability across non-differentiable relational models and practicality. In this work, we develop RelEx, a model-agnostic relational explainer to explain black-box relational models with only access to the outputs of the black-box. RelEx is able to explain any relational model, including SRL models and GNNs. We compare RelEx to the state-of-the-art relational explainer, GNN-Explainer, and relational extensions of iid explanation models and show that RelEx achieves comparable or better performance, while remaining model-agnostic. △ Less

Submitted 30 May, 2020; originally announced June 2020.

arXiv:2005.14165 [pdf, other]

Language Models are Few-Shot Learners

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess , et al. (6 additional authors not shown)

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few… ▽ More Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. △ Less

Submitted 22 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: 40+32 pages

arXiv:2004.05451 [pdf, other]

Understanding the Socio-Economic Disruption in the United States during COVID-19's Early Days

Authors: Swaroop Gowdra Shanthakumar, Anand Seetharam, Arti Ramesh

Abstract: In this paper, we collect and study Twitter communications to understand the socio-economic impact of COVID-19 in the United States during the early days of the pandemic. Our analysis reveals that COVID-19 gripped the nation during this time as is evidenced by the significant number of trending hashtags. With infections soaring rapidly, users took to Twitter asking people to self isolate and quara… ▽ More In this paper, we collect and study Twitter communications to understand the socio-economic impact of COVID-19 in the United States during the early days of the pandemic. Our analysis reveals that COVID-19 gripped the nation during this time as is evidenced by the significant number of trending hashtags. With infections soaring rapidly, users took to Twitter asking people to self isolate and quarantine themselves. Users also demanded closure of schools, bars, and restaurants as well as lockdown of cities and states. The communications reveal the ensuing panic buying and the unavailability of some essential goods, in particular toilet paper. We also observe users express their frustration in their communications as the virus spread continued. We methodically collect a total of 530,206 tweets by identifying and tracking trending COVID-related hashtags. We then group the hashtags into six main categories, namely 1) General COVID, 2) Quarantine, 3) Panic Buying, 4) School Closures, 5) Lockdowns, and 6) Frustration and Hope, and study the temporal evolution of tweets in these hashtags. We conduct a linguistic analysis of words common to all the hashtag groups and specific to each hashtag group. Our preliminary study presents a succinct and aggregated picture of people's response to the pandemic and lays the groundwork for future fine-grained linguistic and behavioral analysis. △ Less

Submitted 11 April, 2020; originally announced April 2020.

arXiv:2002.09523 [pdf, other]

Struct-MMSB: Mixed Membership Stochastic Blockmodels with Interpretable Structured Priors

Authors: Yue Zhang, Arti Ramesh

Abstract: The mixed membership stochastic blockmodel (MMSB) is a popular framework for community detection and network generation. It learns a low-rank mixed membership representation for each node across communities by exploiting the underlying graph structure. MMSB assumes that the membership distributions of the nodes are independently drawn from a Dirichlet distribution, which limits its capability to m… ▽ More The mixed membership stochastic blockmodel (MMSB) is a popular framework for community detection and network generation. It learns a low-rank mixed membership representation for each node across communities by exploiting the underlying graph structure. MMSB assumes that the membership distributions of the nodes are independently drawn from a Dirichlet distribution, which limits its capability to model highly correlated graph structures that exist in real-world networks. In this paper, we present a flexible richly structured MMSB model, \textit{Struct-MMSB}, that uses a recently developed statistical relational learning model, hinge-loss Markov random fields (HL-MRFs), as a structured prior to model complex dependencies among node attributes, multi-relational links, and their relationship with mixed-membership distributions. Our model is specified using a probabilistic programming templating language that uses weighted first-order logic rules, which enhances the model's interpretability. Further, our model is capable of learning latent characteristics in real-world networks via meaningful latent variables encoded as a complex combination of observed features and membership distributions. We present an expectation-maximization based inference algorithm that learns latent variables and parameters iteratively, a scalable stochastic variation of the inference algorithm, and a method to learn the weights of HL-MRF structured priors. We evaluate our model on six datasets across three different types of networks and corresponding modeling scenarios and demonstrate that our models are able to achieve an improvement of 15\% on average in test log-likelihood and faster convergence when compared to state-of-the-art network models. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: ECAI 2020

arXiv:2002.09471 [pdf, other]

Learning Fairness-aware Relational Structures

Authors: Yue Zhang, Arti Ramesh

Abstract: The development of fair machine learning models that effectively avert bias and discrimination is an important problem that has garnered attention in recent years. The necessity of encoding complex relational dependencies among the features and variables for competent predictions require the development of fair, yet expressive relational models. In this work, we introduce Fair-A3SL, a fairness-awa… ▽ More The development of fair machine learning models that effectively avert bias and discrimination is an important problem that has garnered attention in recent years. The necessity of encoding complex relational dependencies among the features and variables for competent predictions require the development of fair, yet expressive relational models. In this work, we introduce Fair-A3SL, a fairness-aware structure learning algorithm for learning relational structures, which incorporates fairness measures while learning relational graphical model structures. Our approach is versatile in being able to encode a wide range of fairness metrics such as statistical parity difference, overestimation, equalized odds, and equal opportunity, including recently proposed relational fairness measures. While existing approaches employ the fairness measures on pre-determined model structures post prediction, Fair-A3SL directly learns the structure while optimizing for the fairness measures and hence is able to remove any structural bias in the model. We demonstrate the effectiveness of our learned model structures when compared with the state-of-the-art fairness models quantitatively and qualitatively on datasets representing three different modeling scenarios: i) a relational dataset, ii) a recidivism prediction dataset widely used in studying discrimination, and iii) a recommender systems dataset. Our results show that Fair-A3SL can learn fair, yet interpretable and expressive structures capable of making accurate predictions. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: Accepted for publication in ECAI 2020

arXiv:2002.03528 [pdf, other]

Multi-object Monocular SLAM for Dynamic Environments

Authors: Gokul B. Nair, Swapnil Daga, Rahul Sajnani, Anirudha Ramesh, Junaid Ahmed Ansari, Krishna Murthy Jatavallabhula, K. Madhava Krishna

Abstract: In this paper, we tackle the problem of multibody SLAM from a monocular camera. The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene. The quintessential challenge in dynamic scenes is unobservability: it is not possible to unambiguously triangulate a moving object from a moving monocular camera. Existing approaches solve res… ▽ More In this paper, we tackle the problem of multibody SLAM from a monocular camera. The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene. The quintessential challenge in dynamic scenes is unobservability: it is not possible to unambiguously triangulate a moving object from a moving monocular camera. Existing approaches solve restricted variants of the problem, but the solutions suffer relative scale ambiguity (i.e., a family of infinitely many solutions exist for each pair of motions in the scene). We solve this rather intractable problem by leveraging single-view metrology, advances in deep learning, and category-level shape estimation. We propose a multi pose-graph optimization formulation, to resolve the relative and absolute scale factor ambiguities involved. This optimization helps us reduce the average error in trajectories of multiple bodies over real-world datasets, such as KITTI. To the best of our knowledge, our method is the first practical monocular multi-body SLAM system to perform dynamic multi-object and ego localization in a unified framework in metric scale. △ Less

Submitted 11 May, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Accepted to IEEE Intelligent Vehicles Symposium 2020 (IV2020)

arXiv:1912.07721 [pdf, ps, other]

Adversarial Model Extraction on Graph Neural Networks

Authors: David DeFazio, Arti Ramesh

Abstract: Along with the advent of deep neural networks came various methods of exploitation, such as fooling the classifier or contaminating its training data. Another such attack is known as model extraction, where provided API access to some black box neural network, the adversary extracts the underlying model. This is done by querying the model in such a way that the underlying neural network provides e… ▽ More Along with the advent of deep neural networks came various methods of exploitation, such as fooling the classifier or contaminating its training data. Another such attack is known as model extraction, where provided API access to some black box neural network, the adversary extracts the underlying model. This is done by querying the model in such a way that the underlying neural network provides enough information to the adversary to be reconstructed. While several works have achieved impressive results with neural network extraction in the propositional domain, this problem has not yet been considered over the relational domain, where data samples are no longer considered to be independent and identically distributed (iid). Graph Neural Networks (GNNs) are a popular deep learning framework to perform machine learning tasks over relational data. In this work, we formalize an instance of GNN extraction, present a solution with preliminary results, and discuss our assumptions and future directions. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (DLGMA), 2020

arXiv:1908.02282 [pdf, other]

A Weakly-Supervised Attention-based Visualization Tool for Assessing Political Affiliation

Authors: Srijith Rajamohan, Alana Romanella, Amit Ramesh

Abstract: In this work, we seek to finetune a weakly-supervised expert-guided Deep Neural Network (DNN) for the purpose of determining political affiliations. In this context, stance detection is used for determining political affiliation or ideology which is framed in the form of relative proximities between entities in a low-dimensional space. An attention-based mechanism is used to provide model interpre… ▽ More In this work, we seek to finetune a weakly-supervised expert-guided Deep Neural Network (DNN) for the purpose of determining political affiliations. In this context, stance detection is used for determining political affiliation or ideology which is framed in the form of relative proximities between entities in a low-dimensional space. An attention-based mechanism is used to provide model interpretability. A Deep Neural Network for Natural Language Understanding (NLU) using static and contextual embeddings is trained and evaluated. Various techniques to visualize the projections generated from the network are evaluated for visualization efficiency. An overview of the pipeline from data ingestion, processing and generation of visualization is given here. A web-based framework created to faciliate this interaction and exploration is presented here. Preliminary results of this study are summarized and future work is outlined. △ Less

Submitted 5 August, 2019; originally announced August 2019.

Comments: 8 pages

arXiv:1906.08873 [pdf, other]

Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

Authors: Suraj Tripathi, Abhiram Ramesh, Abhay Kumar, Chirag Singh, Promod Yenigalla

Abstract: This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficient s (MFCCs) help retain emotion-related low-level chara… ▽ More This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficient s (MFCCs) help retain emotion-related low-level characteristics in speech. We experimented with several Deep Neural Network (DNN) architectures that take in speech features as input and trained them under both softmax and center loss, which resulted in highly discriminative features ideal for Speech Emotion Recognition (SER). Our networks also employ a regularizing effect by simultaneously performing the auxiliary task of reconstructing the input speech features. This sharing of representations among related tasks enables our network to better generalize the original task of SER. Some of our proposed networks contain far fewer parameters when compared to state-of-the-art architectures. △ Less

Submitted 31 August, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: 10 pages, Accepted in IJCAI Affective Computing Workshop 2019

arXiv:1906.05682 [pdf]

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

Authors: Suraj Tripathi, Abhay Kumar, Abhiram Ramesh, Chirag Singh, Promod Yenigalla

Abstract: This paper proposes a Residual Convolutional Neural Network (ResNet) based on speech features and trained under Focal Loss to recognize emotion in speech. Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCCs) have shown the ability to characterize emotion better than just plain text. Further Focal Loss, first used in One-Stage Object Detectors, has shown the ability t… ▽ More This paper proposes a Residual Convolutional Neural Network (ResNet) based on speech features and trained under Focal Loss to recognize emotion in speech. Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCCs) have shown the ability to characterize emotion better than just plain text. Further Focal Loss, first used in One-Stage Object Detectors, has shown the ability to focus the training process more towards hard-examples and down-weight the loss assigned to well-classified examples, thus preventing the model from being overwhelmed by easily classifiable examples. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Comments: Accepted in CICLing 2019

arXiv:1906.05681 [pdf]

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

Authors: Suraj Tripathi, Abhay Kumar, Abhiram Ramesh, Chirag Singh, Promod Yenigalla

Abstract: This paper proposes a speech emotion recognition method based on speech features and speech transcriptions (text). Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCC) help retain emotion-related low-level characteristics in speech whereas text helps capture semantic meaning, both of which help in different aspects of emotion detection. We experimented with several De… ▽ More This paper proposes a speech emotion recognition method based on speech features and speech transcriptions (text). Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCC) help retain emotion-related low-level characteristics in speech whereas text helps capture semantic meaning, both of which help in different aspects of emotion detection. We experimented with several Deep Neural Network (DNN) architectures, which take in different combinations of speech features and text as inputs. The proposed network architectures achieve higher accuracies when compared to state-of-the-art methods on a benchmark dataset. The combined MFCC-Text Convolutional Neural Network (CNN) model proved to be the most accurate in recognizing emotions in IEMOCAP data. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Comments: Accepted in CICLing 2019

arXiv:1812.01161 [pdf, other]

A Spectral Regularizer for Unsupervised Disentanglement

Authors: Aditya Ramesh, Youngduck Choi, Yann LeCun

Abstract: A generative model with a disentangled representation allows for independent control over different aspects of the output. Learning disentangled representations has been a recent topic of great interest, but it remains poorly understood. We show that even for GANs that do not possess disentangled representations, one can find curved trajectories in latent space over which local disentanglement occ… ▽ More A generative model with a disentangled representation allows for independent control over different aspects of the output. Learning disentangled representations has been a recent topic of great interest, but it remains poorly understood. We show that even for GANs that do not possess disentangled representations, one can find curved trajectories in latent space over which local disentanglement occurs. These trajectories are found by iteratively following the leading right-singular vectors of the Jacobian of the generator with respect to its input. Based on this insight, we describe an efficient regularizer that aligns these vectors with the coordinate axes, and show that it can be used to induce disentangled representations in GANs, in a completely unsupervised manner. △ Less

Submitted 5 February, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

arXiv:1807.08415 [pdf, other]

doi 10.1109/TIV.2020.2973550

Clustering of Driving Encounter Scenarios Using Connected Vehicle Trajectories

Authors: Wenshuo Wang, Aditya Ramesh, Ding Zhao

Abstract: Multi-vehicle interaction behavior classification and analysis offer in-depth knowledge to make an efficient decision for autonomous vehicles. This paper aims to cluster a wide range of driving encounter scenarios based only on multi-vehicle GPS trajectories. Towards this end, we propose a generic unsupervised learning framework comprising two layers: feature representation layer and clustering la… ▽ More Multi-vehicle interaction behavior classification and analysis offer in-depth knowledge to make an efficient decision for autonomous vehicles. This paper aims to cluster a wide range of driving encounter scenarios based only on multi-vehicle GPS trajectories. Towards this end, we propose a generic unsupervised learning framework comprising two layers: feature representation layer and clustering layer. In the layer of feature representation, we combine the deep autoencoders with a distance-based measure to map the sequential observations of driving encounters into a computationally tractable space that allows quantifying the spatiotemporal interaction characteristics of two vehicles. The clustering algorithm is then applied to the extracted representations to gather homogeneous driving encounters into groups. Our proposed generic framework is then evaluated using 2,568 naturalistic driving encounters. Experimental results demonstrate that our proposed generic framework incorporated with unsupervised learning can cluster multi-trajectory data into distinct groups. These clustering results could benefit decision-making policy analysis and design for autonomous vehicles. △ Less

Submitted 15 March, 2019; v1 submitted 22 July, 2018; originally announced July 2018.

Comments: 12 pages, 11 figures

arXiv:1806.00499 [pdf, other]

Backpropagation for Implicit Spectral Densities

Authors: Aditya Ramesh, Yann LeCun

Abstract: Most successful machine intelligence systems rely on gradient-based learning, which is made possible by backpropagation. Some systems are designed to aid us in interpreting data when explicit goals cannot be provided. These unsupervised systems are commonly trained by backpropagating through a likelihood function. We introduce a tool that allows us to do this even when the likelihood is not explic… ▽ More Most successful machine intelligence systems rely on gradient-based learning, which is made possible by backpropagation. Some systems are designed to aid us in interpreting data when explicit goals cannot be provided. These unsupervised systems are commonly trained by backpropagating through a likelihood function. We introduce a tool that allows us to do this even when the likelihood is not explicitly set, by instead using the implicit likelihood of the model. Explicitly defining the likelihood often entails making heavy-handed assumptions that impede our ability to solve challenging tasks. On the other hand, the implicit likelihood of the model is accessible without the need for such assumptions. Our tool, which we call spectral backpropagation, allows us to optimize it in much greater generality than what has been attempted before. GANs can also be viewed as a technique for optimizing implicit likelihoods. We study them using spectral backpropagation in order to demonstrate robustness for high-dimensional problems, and identify two novel properties of the generator G: (1) there exist aberrant, nonsensical outputs to which G assigns very high likelihood, and (2) the eigenvectors of the metric induced by G over latent space correspond to quasi-disentangled explanatory factors. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1611.03383 [pdf, other]

Disentangling factors of variation in deep representations using adversarial training

Authors: Michael Mathieu, Junbo Zhao, Pablo Sprechmann, Aditya Ramesh, Yann LeCun

Abstract: We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from ou… ▽ More We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from our ability to distinguish among different observations belonging to the same class. Examples of such observations include images of a set of labeled objects captured at different viewpoints, or recordings of set of speakers dictating multiple phrases. In both instances, the intra-class diversity is the source of the unspecified factors of variation: each object is observed at multiple viewpoints, and each speaker dictates multiple phrases. Learning to disentangle the specified factors from the unspecified ones becomes easier when strong supervision is possible. Suppose that during training, we have access to pairs of images, where each pair shows two different objects captured from the same viewpoint. This source of alignment allows us to solve our task using existing methods. However, labels for the unspecified factors are usually unavailable in realistic scenarios where data acquisition is not strictly controlled. We address the problem of disentanglement in this more general setting by combining deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of generalizing to unseen classes and intra-class variabilities. △ Less

Submitted 10 November, 2016; originally announced November 2016.

Comments: Conference paper in NIPS 2016

arXiv:1204.4015 [pdf, other]

Human Navigational Performance in a Complex Network with Progressive Disruptions

Authors: Amitash Ramesh, Soumya Ramesh, Sudarshan Iyengar, Vinod Sekhar

Abstract: The current paper is an investigation towards understanding the navigational performance of humans on a network when the "landmark" nodes are blocked. We observe that humans learn to cope up, despite the continued introduction of blockages in the network. The experiment proposed involves the task of navigating on a word network based on a puzzle called the wordmorph. We introduce blockages in the… ▽ More The current paper is an investigation towards understanding the navigational performance of humans on a network when the "landmark" nodes are blocked. We observe that humans learn to cope up, despite the continued introduction of blockages in the network. The experiment proposed involves the task of navigating on a word network based on a puzzle called the wordmorph. We introduce blockages in the network and report an incremental improvement in performance with respect to time. We explain this phenomenon by analyzing the evolution of the knowledge in the human participants of the underlying network as more and more landmarks are removed. We hypothesize that humans learn the bare essentials to navigate unless we introduce blockages in the network which would whence enforce upon them the need to explore newer ways of navigating. We draw a parallel to human problem solving and postulate that obstacles are catalysts for humans to innovate techniques to solve a restricted variant of a familiar problem. △ Less

Submitted 18 April, 2012; originally announced April 2012.

Showing 1–50 of 51 results for author: Ramesh, A