-
Scaling Instructable Agents Across Many Simulated Worlds
Authors:
SIMA Team,
Maria Abi Raad,
Arun Ahuja,
Catarina Barros,
Frederic Besse,
Andrew Bolt,
Adrian Bolton,
Bethanie Brownfield,
Gavin Buttimore,
Max Cant,
Sarah Chakera,
Stephanie C. Y. Chan,
Jeff Clune,
Adrian Collister,
Vikki Copeman,
Alex Cullum,
Ishita Dasgupta,
Dario de Cesare,
Julia Di Trapani,
Yani Donchev,
Emma Dunleavy,
Martin Engelcke,
Ryan Faulkner,
Frankie Garcia,
Charles Gbadamosi
, et al. (68 additional authors not shown)
Abstract:
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio…
▽ More
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.
△ Less
Submitted 17 April, 2024; v1 submitted 13 March, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
SeasFire as a Multivariate Earth System Datacube for Wildfire Dynamics
Authors:
Ilektra Karasante,
Lazaro Alonso,
Ioannis Prapas,
Akanksha Ahuja,
Nuno Carvalhais,
Ioannis Papoutsis
Abstract:
The global occurrence, scale, and frequency of wildfires pose significant threats to ecosystem services and human livelihoods. To effectively quantify and attribute the antecedent conditions for wildfires, a thorough understanding of Earth system dynamics is imperative. In response, we introduce the SeasFire datacube, a meticulously curated spatiotemporal dataset tailored for global sub-seasonal t…
▽ More
The global occurrence, scale, and frequency of wildfires pose significant threats to ecosystem services and human livelihoods. To effectively quantify and attribute the antecedent conditions for wildfires, a thorough understanding of Earth system dynamics is imperative. In response, we introduce the SeasFire datacube, a meticulously curated spatiotemporal dataset tailored for global sub-seasonal to seasonal wildfire modeling via Earth observation. The SeasFire datacube comprises of 59 variables encompassing climate, vegetation, oceanic indices, and human factors, has an 8-day temporal resolution and a spatial resolution of 0.25$^{\circ}$, and spans from 2001 to 2021. We showcase the versatility of SeasFire for exploring the variability and seasonality of wildfire drivers, modeling causal links between ocean-climate teleconnections and wildfires, and predicting sub-seasonal wildfire patterns across multiple timescales with a Deep Learning model. We publicly release the SeasFire datacube and appeal to Earth system scientists and Machine Learning practitioners to use it for an improved understanding and anticipation of wildfires.
△ Less
Submitted 22 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Hierarchical reinforcement learning with natural language subgoals
Authors:
Arun Ahuja,
Kavya Kopparapu,
Rob Fergus,
Ishita Dasgupta
Abstract:
Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tas…
▽ More
Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. In particular, we use unconstrained natural language to parameterize this space. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space. Our work presents a novel approach to combining human expert supervision with the benefits and flexibility of reinforcement learning.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Computing a human-like reaction time metric from stable recurrent vision models
Authors:
Lore Goetschalckx,
Lakshmi Narasimhan Govindarajan,
Alekh Karkada Ashok,
Aarit Ahuja,
David L. Sheinberg,
Thomas Serre
Abstract:
The meteoric rise in the adoption of deep neural networks as computational models of vision has inspired efforts to "align" these models with humans. One dimension of interest for alignment includes behavioral choices, but moving beyond characterizing choice patterns to capturing temporal aspects of visual decision-making has been challenging. Here, we sketch a general-purpose methodology to const…
▽ More
The meteoric rise in the adoption of deep neural networks as computational models of vision has inspired efforts to "align" these models with humans. One dimension of interest for alignment includes behavioral choices, but moving beyond characterizing choice patterns to capturing temporal aspects of visual decision-making has been challenging. Here, we sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model. Specifically, we introduce a novel metric leveraging insights from subjective logic theory summarizing evidence accumulation in recurrent vision models. We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks spanning perceptual grou**, mental simulation, and scene categorization. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks toward generating testable hypotheses for neuroscience. Links to the code and data can be found on the project page: https://serre-lab.github.io/rnn_rts_site.
△ Less
Submitted 6 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Nerfstudio: A Modular Framework for Neural Radiance Field Development
Authors:
Matthew Tancik,
Ethan Weber,
Evonne Ng,
Ruilong Li,
Brent Yi,
Justin Kerr,
Terrance Wang,
Alexander Kristoffersen,
Jake Austin,
Kamyar Salahi,
Abhik Ahuja,
David McAllister,
Angjoo Kanazawa
Abstract:
Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and pr…
▽ More
Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.
△ Less
Submitted 16 October, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Collaborating with language models for embodied reasoning
Authors:
Ishita Dasgupta,
Christine Kaeser-Chen,
Kenneth Marino,
Arun Ahuja,
Sheila Babayan,
Felix Hill,
Rob Fergus
Abstract:
Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the…
▽ More
Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Distilling Internet-Scale Vision-Language Models into Embodied Agents
Authors:
Theodore Sumers,
Kenneth Marino,
Arun Ahuja,
Rob Fergus,
Ishita Dasgupta
Abstract:
Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight…
▽ More
Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight experience replay (HER), using a VLM to retroactively generate language describing the agent's behavior. Simple prompting allows us to control the supervision signal, teaching an agent to interact with novel objects based on their names (e.g., planes) or their features (e.g., colors) in a 3D rendered environment. Fewshot prompting lets us teach abstract category membership, including pre-existing categories (food vs toys) and ad-hoc ones (arbitrary preferences over objects). Our work outlines a new and effective way to use internet-scale VLMs, repurposing the generic language grounding acquired by such models to teach task-relevant groundings to embodied agents.
△ Less
Submitted 14 June, 2023; v1 submitted 29 January, 2023;
originally announced January 2023.
-
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
Authors:
Josh Abramson,
Arun Ahuja,
Federico Carnevale,
Petko Georgiev,
Alex Goldin,
Alden Hung,
Jessica Landon,
Jirka Lhotka,
Timothy Lillicrap,
Alistair Muldal,
George Powell,
Adam Santoro,
Guy Scully,
Sanjana Srivastava,
Tamara von Glehn,
Greg Wayne,
Nathaniel Wong,
Chen Yan,
Rui Zhu
Abstract:
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulate…
▽ More
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts
Authors:
Stefano Pini,
Christian S. Perone,
Aayush Ahuja,
Ana Sofia Rufino Ferreira,
Moritz Niendorf,
Sergey Zagoruyko
Abstract:
The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leve…
▽ More
The goal of autonomous vehicles is to navigate public roads safely and comfortably. To enforce safety, traditional planning approaches rely on handcrafted rules to generate trajectories. Machine learning-based systems, on the other hand, scale with data and are able to learn more complex behaviors. However, they often ignore that agents and self-driving vehicle trajectory distributions can be leveraged to improve safety. In this paper, we propose modeling a distribution over multiple future trajectories for both the self-driving vehicle and other road agents, using a unified neural network architecture for prediction and planning. During inference, we select the planning trajectory that minimizes a cost taking into account safety and the predicted probabilities. Our approach does not depend on any rule-based planners for trajectory generation or optimization, improves with more training data and is simple to implement. We extensively evaluate our method through a realistic simulator and show that the predicted trajectory distribution corresponds to different driving profiles. We also successfully deploy it on a self-driving vehicle on urban public roads, confirming that it drives safely without compromising comfort. The code for training and testing our model on a public prediction dataset and the video of the road test are available at https://woven.mobi/safepathnet
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Deep Learning for Global Wildfire Forecasting
Authors:
Ioannis Prapas,
Akanksha Ahuja,
Spyros Kondylatos,
Ilektra Karasante,
Eleanna Panagiotou,
Lazaro Alonso,
Charalampos Davalas,
Dimitrios Michail,
Nuno Carvalhais,
Ioannis Papoutsis
Abstract:
Climate change is expected to aggravate wildfire activity through the exacerbation of fire weather. Improving our capabilities to anticipate wildfires on a global scale is of uttermost importance for mitigating their negative effects. In this work, we create a global fire dataset and demonstrate a prototype for predicting the presence of global burned areas on a sub-seasonal scale with the use of…
▽ More
Climate change is expected to aggravate wildfire activity through the exacerbation of fire weather. Improving our capabilities to anticipate wildfires on a global scale is of uttermost importance for mitigating their negative effects. In this work, we create a global fire dataset and demonstrate a prototype for predicting the presence of global burned areas on a sub-seasonal scale with the use of segmentation deep learning models. Particularly, we present an open-access global analysis-ready datacube, which contains a variety of variables related to the seasonal and sub-seasonal fire drivers (climate, vegetation, oceanic indices, human-related variables), as well as the historical burned areas and wildfire emissions for 2001-2021. We train a deep learning model, which treats global wildfire forecasting as an image segmentation task and skillfully predicts the presence of burned areas 8, 16, 32 and 64 days ahead of time. Our work motivates the use of deep learning for global burned area forecasting and paves the way towards improved anticipation of global wildfire patterns.
△ Less
Submitted 16 October, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Learning to Navigate Wikipedia by Taking Random Walks
Authors:
Manzil Zaheer,
Kenneth Marino,
Will Grathwohl,
John Schultz,
Wendy Shang,
Sheila Babayan,
Arun Ahuja,
Ishita Dasgupta,
Christine Kaeser-Chen,
Rob Fergus
Abstract:
A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this pa…
▽ More
A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and ranking methods, they are competitive results to the state-of-the-art methods.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Evaluating Multimodal Interactive Agents
Authors:
Josh Abramson,
Arun Ahuja,
Federico Carnevale,
Petko Georgiev,
Alex Goldin,
Alden Hung,
Jessica Landon,
Timothy Lillicrap,
Alistair Muldal,
Blake Richards,
Adam Santoro,
Tamara von Glehn,
Greg Wayne,
Nathaniel Wong,
Chen Yan
Abstract:
Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and prese…
▽ More
Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. A video may be found at https://youtu.be/YR1TngGORGQ.
△ Less
Submitted 14 July, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
SightSteeple: Agreeing to Disagree with Functional Blockchain Consensus
Authors:
Aditya Ahuja
Abstract:
Classical and contemporary distributed consensus protocols, may they be for binary agreement, state machine replication, or blockchain consensus, require all protocol participants in a peer-to-peer system to agree on exactly the same information as part of the consensus payload. Although this model of consensus is extensively studied, and is useful for most consensus based decentralized applicatio…
▽ More
Classical and contemporary distributed consensus protocols, may they be for binary agreement, state machine replication, or blockchain consensus, require all protocol participants in a peer-to-peer system to agree on exactly the same information as part of the consensus payload. Although this model of consensus is extensively studied, and is useful for most consensus based decentralized applications, it falls short of defining correct distributed systems which mandate participant credential based privileged visibility into the consensus payload, through the consensus protocol itself. We introduce a new paradigm for distributed consensus, called functional blockchain consensus. Functional blockchain consensus allows each blockchain protocol participant to agree on some distinct sub-information of the list of transactions, as a function of the credentials of the participant in the blockchain system, instead of agreeing on the entire list of transactions. We motivate two adversary models, one with a standard crash-fault adversary and another with a novel rational-fault adversary, to compromise functional blockchain consensus. We then present two versions of a blockchain protocol called SightSteeple, that achieves functional blockchain consensus in the said fault models. SightSteeple relies on a novel combination of standard blockchain consensus and functional encryption, among other primitives, to achieve its goals of correctness. Finally, we discuss practical uses of functional blockchain consensus based asymmetric distributed ledgers, and motivate off-shoot constructions that can result from this new consensus paradigm.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Subspace Modeling for Fast Out-Of-Distribution and Anomaly Detection
Authors:
Ibrahima J. Ndiour,
Nilesh A. Ahuja,
Omesh Tickoo
Abstract:
This paper presents a fast, principled approach for detecting anomalous and out-of-distribution (OOD) samples in deep neural networks (DNN). We propose the application of linear statistical dimensionality reduction techniques on the semantic features produced by a DNN, in order to capture the low-dimensional subspace truly spanned by said features. We show that the "feature reconstruction error" (…
▽ More
This paper presents a fast, principled approach for detecting anomalous and out-of-distribution (OOD) samples in deep neural networks (DNN). We propose the application of linear statistical dimensionality reduction techniques on the semantic features produced by a DNN, in order to capture the low-dimensional subspace truly spanned by said features. We show that the "feature reconstruction error" (FRE), which is the $\ell_2$-norm of the difference between the original feature in the high-dimensional space and the pre-image of its low-dimensional reduced embedding, is highly effective for OOD and anomaly detection. To generalize to intermediate features produced at any given layer, we extend the methodology by applying nonlinear kernel-based methods. Experiments using standard image datasets and DNN architectures demonstrate that our method meets or exceeds best-in-class quality performance, but at a fraction of the computational and memory cost required by the state of the art. It can be trained and run very efficiently, even on a traditional CPU.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Predicting Airbnb Rental Prices Using Multiple Feature Modalities
Authors:
Aditya Ahuja,
Aditya Lahiri,
Aniruddha Das
Abstract:
Figuring out the price of a listed Airbnb rental is an important and difficult task for both the host and the customer. For the former, it can enable them to set a reasonable price without compromising on their profits. For the customer, it helps understand the key drivers for price and also provides them with similarly priced places. This price prediction regression task can also have multiple do…
▽ More
Figuring out the price of a listed Airbnb rental is an important and difficult task for both the host and the customer. For the former, it can enable them to set a reasonable price without compromising on their profits. For the customer, it helps understand the key drivers for price and also provides them with similarly priced places. This price prediction regression task can also have multiple downstream uses, such as in recommendation of similar rentals based on price. We propose to use geolocation, temporal, visual and natural language features to create a reliable and accurate price prediction algorithm.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
Authors:
DeepMind Interactive Agents Team,
Josh Abramson,
Arun Ahuja,
Arthur Brussee,
Federico Carnevale,
Mary Cassin,
Felix Fischer,
Petko Georgiev,
Alex Goldin,
Mansi Gupta,
Tim Harley,
Felix Hill,
Peter C Humphreys,
Alden Hung,
Jessica Landon,
Timothy Lillicrap,
Hamza Merzic,
Alistair Muldal,
Adam Santoro,
Guy Scully,
Tamara von Glehn,
Greg Wayne,
Nathaniel Wong,
Chen Yan,
Rui Zhu
Abstract:
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a…
▽ More
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection. Altogether, our results demonstrate that imitation of multi-modal, real-time human behaviour may provide a straightforward and surprisingly effective means of imbuing agents with a rich behavioural prior from which agents might then be fine-tuned for specific purposes, thus laying a foundation for training capable agents for interactive robots or digital assistants. A video of MIA's behaviour may be found at https://youtu.be/ZFgRhviF7mY
△ Less
Submitted 2 February, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Scalable Primitives for Generalized Sensor Fusion in Autonomous Vehicles
Authors:
Sammy Sidhu,
Linda Wang,
Tayyab Naseer,
Ashish Malhotra,
Jay Chia,
Aayush Ahuja,
Ella Rasmussen,
Qiangui Huang,
Ray Gao
Abstract:
In autonomous driving, there has been an explosion in the use of deep neural networks for perception, prediction and planning tasks. As autonomous vehicles (AVs) move closer to production, multi-modal sensor inputs and heterogeneous vehicle fleets with different sets of sensor platforms are becoming increasingly common in the industry. However, neural network architectures typically target specifi…
▽ More
In autonomous driving, there has been an explosion in the use of deep neural networks for perception, prediction and planning tasks. As autonomous vehicles (AVs) move closer to production, multi-modal sensor inputs and heterogeneous vehicle fleets with different sets of sensor platforms are becoming increasingly common in the industry. However, neural network architectures typically target specific sensor platforms and are not robust to changes in input, making the problem of scaling and model deployment particularly difficult. Furthermore, most players still treat the problem of optimizing software and hardware as entirely independent problems. We propose a new end to end architecture, Generalized Sensor Fusion (GSF), which is designed in such a way that both sensor inputs and target tasks are modular and modifiable. This enables AV system designers to easily experiment with different sensor configurations and methods and opens up the ability to deploy on heterogeneous fleets using the same models that are shared across a large engineering organization. Using this system, we report experimental results where we demonstrate near-parity of an expensive high-density (HD) LiDAR sensor with a cheap low-density (LD) LiDAR plus camera setup in the 3D object detection task. This paves the way for the industry to jointly design hardware and software architectures as well as large fleets with heterogeneous configurations.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
PilotEar: Enabling In-ear Inertial Navigation
Authors:
Ashwin Ahuja,
Andrea Ferlini,
Cecilia Mascolo
Abstract:
Navigation systems are used daily. While different types of navigation systems exist, inertial navigation systems (INS) have favorable properties for some wearables which, for battery and form factors may not be able to use GPS. Earables (aka ear-worn wearables) are living a momentum both as leisure devices, and sensing and computing platforms. The inherent high signal to noise ratio (SNR) of ear-…
▽ More
Navigation systems are used daily. While different types of navigation systems exist, inertial navigation systems (INS) have favorable properties for some wearables which, for battery and form factors may not be able to use GPS. Earables (aka ear-worn wearables) are living a momentum both as leisure devices, and sensing and computing platforms. The inherent high signal to noise ratio (SNR) of ear-collected inertial data, due to the vibration dum** of the musculoskeletal system; combined with the fact that people typically wear a pair of earables (one per ear) could offer significant accuracy when tracking head movements, leading to potential improvements for inertial navigation. Hence, in this work, we investigate and propose PilotEar, the first end-to-end earable-based inertial navigation system, achieving an average tracking drift of 0.15 m/s for one earable and 0.11 m/s for two earables.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
A Review of Some Techniques for Inclusion of Domain-Knowledge into Deep Neural Networks
Authors:
Tirtharaj Dash,
Sharad Chitlangia,
Aditya Ahuja,
Ashwin Srinivasan
Abstract:
We present a survey of ways in which existing scientific knowledge are included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit signi…
▽ More
We present a survey of ways in which existing scientific knowledge are included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form. This paper examines the inclusion of domain-knowledge by means of changes to: the input, the loss-function, and the architecture of deep networks. The categorisation is for ease of exposition: in practice we expect a combination of such changes will be employed. In each category, we describe techniques that have been shown to yield significant changes in the performance of deep neural networks.
△ Less
Submitted 21 December, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Imitation by Predicting Observations
Authors:
Andrew Jaegle,
Yury Sulsky,
Arun Ahuja,
Jake Bruce,
Rob Fergus,
Greg Wayne
Abstract:
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous con…
▽ More
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Detecting Vehicle Type and License Plate Number of different Vehicles on Images
Authors:
Aashna Ahuja,
Arindam Chaudhuri
Abstract:
With ever increasing number of vehicles, vehicular tracking is one of the major challenges faced by urban areas. In this paper we try to develop a model that can locate a particular vehicle that the user is looking for depending on two factors 1. the Type of vehicle and the 2. License plate number of the car. The proposed system uses a unique mixture consisting of Mask R-CNN model for vehicle type…
▽ More
With ever increasing number of vehicles, vehicular tracking is one of the major challenges faced by urban areas. In this paper we try to develop a model that can locate a particular vehicle that the user is looking for depending on two factors 1. the Type of vehicle and the 2. License plate number of the car. The proposed system uses a unique mixture consisting of Mask R-CNN model for vehicle type detection, WpodNet and pytesseract for License Plate detection and Prediction of letters in it.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
A Regulatory System for Optimal Legal Transaction Throughput in Cryptocurrency Blockchains
Authors:
Aditya Ahuja,
Vinay J. Ribeiro,
Ranjan Pal
Abstract:
Permissionless blockchain consensus protocols have been designed primarily for defining decentralized economies for the commercial trade of assets, both virtual and physical, using cryptocurrencies. In most instances, the assets being traded are regulated, which mandates that the legal right to their trade and their trade value are determined by the governmental regulator of the jurisdiction in wh…
▽ More
Permissionless blockchain consensus protocols have been designed primarily for defining decentralized economies for the commercial trade of assets, both virtual and physical, using cryptocurrencies. In most instances, the assets being traded are regulated, which mandates that the legal right to their trade and their trade value are determined by the governmental regulator of the jurisdiction in which the trade occurs. Unfortunately, existing blockchains do not formally recognise proposal of legal cryptocurrency transactions, as part of the execution of their respective consensus protocols, resulting in rampant illegal activities in the associated crypto-economies. In this contribution, we motivate the need for regulated blockchain consensus protocols with a case study of the illegal, cryptocurrency based, Silk Road darknet market. We present a novel regulatory framework for blockchain protocols, for ensuring legal transaction confirmation as part of the blockchain distributed consensus. As per our regulatory framework, we derive conditions under which legal transaction throughput supersedes throughput of traditional transactions, which are, in the worst case, an indifferentiable mix of legal and illegal transactions. Finally, we show that with a small change to the standard blockchain consensus execution policy (appropriately introduced through regulation), the legal transaction throughput in the blockchain network can be maximized.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Incorporating Domain Knowledge into Deep Neural Networks
Authors:
Tirtharaj Dash,
Sharad Chitlangia,
Aditya Ahuja,
Ashwin Srinivasan
Abstract:
We present a survey of ways in which domain-knowledge has been included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly…
▽ More
We present a survey of ways in which domain-knowledge has been included when constructing models with neural networks. The inclusion of domain-knowledge is of special interest not just to constructing scientific assistants, but also, many other areas that involve understanding data using human-machine collaboration. In many such instances, machine-based model construction may benefit significantly from being provided with human-knowledge of the domain encoded in a sufficiently precise form. This paper examines two broad approaches to encode such knowledge--as logical and numerical constraints--and describes techniques and results obtained in several sub-categories under each of these approaches.
△ Less
Submitted 15 March, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Imitating Interactive Intelligence
Authors:
Josh Abramson,
Arun Ahuja,
Iain Barr,
Arthur Brussee,
Federico Carnevale,
Mary Cassin,
Rachita Chhaparia,
Stephen Clark,
Bogdan Damoc,
Andrew Dudzik,
Petko Georgiev,
Aurelia Guy,
Tim Harley,
Felix Hill,
Alden Hung,
Zachary Kenton,
Jessica Landon,
Timothy Lillicrap,
Kory Mathewson,
Soňa Mokrá,
Alistair Muldal,
Adam Santoro,
Nikolay Savinov,
Vikrant Varma,
Greg Wayne
, et al. (4 additional authors not shown)
Abstract:
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha…
▽ More
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.
△ Less
Submitted 20 January, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Behavior Priors for Efficient Reinforcement Learning
Authors:
Dhruva Tirumala,
Alexandre Galashov,
Hyeonwoo Noh,
Leonard Hasenclever,
Razvan Pascanu,
Jonathan Schwarz,
Guillaume Desjardins,
Wojciech Marian Czarnecki,
Arun Ahuja,
Yee Whye Teh,
Nicolas Heess
Abstract:
As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors…
▽ More
As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors that capture the common movement and interaction patterns that are shared across a set of related tasks or contexts. For example the day-to day behavior of humans comprises distinctive locomotion and manipulation patterns that recur across many different situations and goals. We discuss how such behavior patterns can be captured using probabilistic trajectory models and how these can be integrated effectively into reinforcement learning schemes, e.g.\ to facilitate multi-task and transfer learning. We then extend these ideas to latent variable models and consider a formulation to learn hierarchical priors that capture different aspects of the behavior in reusable modules. We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives, thereby offering an alternative perspective on existing ideas. We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Probing Emergent Semantics in Predictive Agents via Question Answering
Authors:
Abhishek Das,
Federico Carnevale,
Hamza Merzic,
Laura Rimell,
Rosalia Schneider,
Josh Abramson,
Alden Hung,
Arun Ahuja,
Stephen Clark,
Gregory Wayne,
Felix Hill
Abstract:
Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and…
▽ More
Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling -action-conditional CPC (Guo et al., 2018) and SimCore (Gregor et al., 2019). After training agents with these predictive objectives in a visually-rich, 3D environment with an assortment of objects, colors, shapes, and spatial configurations, we probe their internal state representations with synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent. The performance of different agents when probed this way reveals that they learn to encode factual, and seemingly compositional, information about objects, properties and spatial relations from their physical environment. Our approach is intuitive, i.e. humans can easily interpret responses of the model as opposed to inspecting continuous vectors, and model-agnostic, i.e. applicable to any modeling approach. By revealing the implicit knowledge of objects, quantities, properties and relations acquired by agents as they learn, question-conditional agent probing can stimulate the design and development of stronger predictive learning objectives.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks
Authors:
Josh Merel,
Saran Tunyasuvunakool,
Arun Ahuja,
Yuval Tassa,
Leonard Hasenclever,
Vu Pham,
Tom Erez,
Greg Wayne,
Nicolas Heess
Abstract:
We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions. This challenge is central to a variety of fields, from graphics and animation to robotics and motor neuroscience. Our physics-based environment uses realistic actuation and first-person perception -- including touch sensors a…
▽ More
We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions. This challenge is central to a variety of fields, from graphics and animation to robotics and motor neuroscience. Our physics-based environment uses realistic actuation and first-person perception -- including touch sensors and egocentric vision -- with a view to producing active-sensing behaviors (e.g. gaze direction), transferability to real robots, and comparisons to the biology. We develop an integrated neural-network based approach consisting of a motor primitive module, human demonstrations, and an instructed reinforcement learning regime with curricula and task variations. We demonstrate the utility of our approach for several tasks, including goal-conditioned box carrying and ball catching, and we characterize its behavioral robustness. The resulting controllers can be deployed in real-time on a standard PC. See overview video, https://youtu.be/2rQAW-8gQQk .
△ Less
Submitted 16 June, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making
Authors:
Rogerio Bonatti,
Wenshan Wang,
Cherie Ho,
Aayush Ahuja,
Mirko Gschwindt,
Efe Camci,
Erdal Kayacan,
Sanjiban Choudhury,
Sebastian Scherer
Abstract:
Aerial cinematography is revolutionizing industries that require live and dynamic camera viewpoints such as entertainment, sports, and security. However, safely piloting a drone while filming a moving target in the presence of obstacles is immensely taxing, often requiring multiple expert human operators. Hence, there is demand for an autonomous cinematographer that can reason about both geometry…
▽ More
Aerial cinematography is revolutionizing industries that require live and dynamic camera viewpoints such as entertainment, sports, and security. However, safely piloting a drone while filming a moving target in the presence of obstacles is immensely taxing, often requiring multiple expert human operators. Hence, there is demand for an autonomous cinematographer that can reason about both geometry and scene context in real-time. Existing approaches do not address all aspects of this problem; they either require high-precision motion-capture systems or GPS tags to localize targets, rely on prior maps of the environment, plan for short time horizons, or only follow artistic guidelines specified before flight.
In this work, we address the problem in its entirety and propose a complete system for real-time aerial cinematography that for the first time combines: (1) vision-based target estimation; (2) 3D signed-distance map** for occlusion estimation; (3) efficient trajectory optimization for long time-horizon camera motion; and (4) learning-based artistic shot selection. We extensively evaluate our system both in simulation and in field experiments by filming dynamic targets moving through unstructured environments. Our results indicate that our system can operate reliably in the real world without restrictive assumptions. We also provide in-depth analysis and discussions for each module, with the hope that our design tradeoffs can generalize to other related applications. Videos of the complete system can be found at: https://youtu.be/ookhHnqmlaU.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Authors:
H. Francis Song,
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Aidan Clark,
Hubert Soyer,
Jack W. Rae,
Seb Noury,
Arun Ahuja,
Siqi Liu,
Dhruva Tirumala,
Nicolas Heess,
Dan Belov,
Martin Riedmiller,
Matthew M. Botvinick
Abstract:
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie…
▽ More
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Probabilistic Modeling of Deep Features for Out-of-Distribution and Adversarial Detection
Authors:
Nilesh A. Ahuja,
Ibrahima Ndiour,
Trushant Kalyanpur,
Omesh Tickoo
Abstract:
We present a principled approach for detecting out-of-distribution (OOD) and adversarial samples in deep neural networks. Our approach consists in modeling the outputs of the various layers (deep features) with parametric probability distributions once training is completed. At inference, the likelihoods of the deep features w.r.t the previously learnt distributions are calculated and used to deri…
▽ More
We present a principled approach for detecting out-of-distribution (OOD) and adversarial samples in deep neural networks. Our approach consists in modeling the outputs of the various layers (deep features) with parametric probability distributions once training is completed. At inference, the likelihoods of the deep features w.r.t the previously learnt distributions are calculated and used to derive uncertainty estimates that can discriminate in-distribution samples from OOD samples. We explore the use of two classes of multivariate distributions for modeling the deep features - Gaussian and Gaussian mixture - and study the trade-off between accuracy and computational complexity. We demonstrate benefits of our approach on image features by detecting OOD images and adversarially-generated images, using popular DNN architectures on MNIST and CIFAR10 datasets. We show that more precise modeling of the feature distributions result in significantly improved detection of OOD and adversarial samples; up to 12 percentage points in AUPR and AUROC metrics. We further show that our approach remains extremely effective when applied to video data and associated spatio-temporal features by detecting adversarial samples on activity classification tasks using UCF101 dataset, and the C3D network. To our knowledge, our methodology is the first one reported for reliably detecting white-box adversarial framing, a state-of-the-art adversarial attack for video classifiers.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Improved Generalization of Heading Direction Estimation for Aerial Filming Using Semi-supervised Regression
Authors:
Wenshan Wang,
Aayush Ahuja,
Yanfu Zhang,
Rogerio Bonatti,
Sebastian Scherer
Abstract:
In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in d…
▽ More
In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in data distributions. Towards improving generalization with less amount of labeled data, this paper presents a semi-supervised algorithm for heading direction estimation problem. We utilize temporal continuity as the unsupervised signal to regularize the model and achieve better generalization ability. This semi-supervised algorithm is applied to both training and testing phases, which increases the testing performance by a large margin. We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced. We also discuss several important details on improving the performance by balancing labeled and unlabeled loss, and making good combinations. Experimental results show that our approach robustly outputs the heading direction for different types of actor. The aesthetic value of the video is also improved in the aerial filming task.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Exploiting Hierarchy for Learning and Transfer in KL-regularized RL
Authors:
Dhruva Tirumala,
Hyeonwoo Noh,
Alexandre Galashov,
Leonard Hasenclever,
Arun Ahuja,
Greg Wayne,
Razvan Pascanu,
Yee Whye Teh,
Nicolas Heess
Abstract:
As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or pri…
▽ More
As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or prior behavior, which can be learned alongside the policy and as such partially transforms the reinforcement learning problem into one of behavior modelling. In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables. We discuss how the resulting hierarchical structures can be used to implement different inductive biases and how their modularity can benefit transfer. Empirically we find that they can lead to faster learning and transfer on a range of continuous control tasks.
△ Less
Submitted 23 January, 2020; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Neural probabilistic motor primitives for humanoid control
Authors:
Josh Merel,
Leonard Hasenclever,
Alexandre Galashov,
Arun Ahuja,
Vu Pham,
Greg Wayne,
Yee Whye Teh,
Nicolas Heess
Abstract:
We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids. To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. We show that it is possible to train this model entirely offline to compress thousands of expert…
▽ More
We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids. To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. We show that it is possible to train this model entirely offline to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for offline policy cloning, including an experience efficient method which we call linear feedback policy cloning. We encourage readers to view a supplementary video ( https://youtu.be/CaDEf-QcKwA ) summarizing our results.
△ Less
Submitted 15 January, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Experience Replay for Continual Learning
Authors:
David Rolnick,
Arun Ahuja,
Jonathan Schwarz,
Timothy P. Lillicrap,
Greg Wayne
Abstract:
Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a…
▽ More
Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a major hurdle to learning with non-stationary data or sequences of new tasks, and prevents networks from continually accumulating knowledge and skills. We examine this issue in the context of reinforcement learning, in a setting where an agent is exposed to tasks in a sequence. Unlike most other work, we do not provide an explicit indication to the model of task boundaries, which is the most general circumstance for a learning agent exposed to continuous experience. While various methods to counteract catastrophic forgetting have recently been proposed, we explore a straightforward, general, and seemingly overlooked solution - that of using experience replay buffers for all past events - with a mixture of on- and off-policy learning, leveraging behavioral cloning. We show that this strategy can still learn new tasks quickly yet can substantially reduce catastrophic forgetting in both Atari and DMLab domains, even matching the performance of methods that require task identities. When buffer storage is constrained, we confirm that a simple mechanism for randomly discarding data allows a limited size buffer to perform almost as well as an unbounded one.
△ Less
Submitted 26 November, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Hierarchical visuomotor control of humanoids
Authors:
Josh Merel,
Arun Ahuja,
Vu Pham,
Saran Tunyasuvunakool,
Siqi Liu,
Dhruva Tirumala,
Nicolas Heess,
Greg Wayne
Abstract:
We aim to build complex humanoid agents that integrate perception, motor control, and memory. In this work, we partly factor this problem into low-level motor control from proprioception and high-level coordination of the low-level skills informed by vision. We develop an architecture capable of surprisingly flexible, task-directed motor control of a relatively high-DoF humanoid body by combining…
▽ More
We aim to build complex humanoid agents that integrate perception, motor control, and memory. In this work, we partly factor this problem into low-level motor control from proprioception and high-level coordination of the low-level skills informed by vision. We develop an architecture capable of surprisingly flexible, task-directed motor control of a relatively high-DoF humanoid body by combining pre-training of low-level motor controllers with a high-level, task-focused controller that switches among low-level sub-policies. The resulting system is able to control a physically-simulated humanoid body to solve tasks that require coupling visual perception from an unstabilized egocentric RGB camera during locomotion in the environment. For a supplementary video link, see https://youtu.be/7GISvfbykLE .
△ Less
Submitted 15 January, 2019; v1 submitted 23 November, 2018;
originally announced November 2018.
-
Optimizing Agent Behavior over Long Time Scales by Transporting Value
Authors:
Chia-Chun Hung,
Timothy Lillicrap,
Josh Abramson,
Yan Wu,
Mehdi Mirza,
Federico Carnevale,
Arun Ahuja,
Greg Wayne
Abstract:
Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequ…
▽ More
Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequences across long spans of time, which figures prominently in addressing the problem of long-term temporal credit assignment; in artificial intelligence (AI) this is the question of how to evaluate the utility of the actions within a long-duration behavioral sequence leading to success or failure in a task. Existing approaches to shorter-term credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a new paradigm for reinforcement learning where agents use recall of specific memories to credit actions from the past, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire computational models in neuroscience, psychology, and behavioral economics.
△ Less
Submitted 21 December, 2018; v1 submitted 15 October, 2018;
originally announced October 2018.
-
BoxNet: Deep Learning Based Biomedical Image Segmentation Using Boxes Only Annotation
Authors:
Lin Yang,
Yizhe Zhang,
Zhuo Zhao,
Hao Zheng,
Peixian Liang,
Michael T. C. Ying,
Anil T. Ahuja,
Danny Z. Chen
Abstract:
In recent years, deep learning (DL) methods have become powerful tools for biomedical image segmentation. However, high annotation efforts and costs are commonly needed to acquire sufficient biomedical training data for DL models. To alleviate the burden of manual annotation, in this paper, we propose a new weakly supervised DL approach for biomedical image segmentation using boxes only annotation…
▽ More
In recent years, deep learning (DL) methods have become powerful tools for biomedical image segmentation. However, high annotation efforts and costs are commonly needed to acquire sufficient biomedical training data for DL models. To alleviate the burden of manual annotation, in this paper, we propose a new weakly supervised DL approach for biomedical image segmentation using boxes only annotation. First, we develop a method to combine graph search (GS) and DL to generate fine object masks from box annotation, in which DL uses box annotation to compute a rough segmentation for GS and then GS is applied to locate the optimal object boundaries. During the mask generation process, we carefully utilize information from box annotation to filter out potential errors, and then use the generated masks to train an accurate DL segmentation network. Extensive experiments on gland segmentation in histology images, lymph node segmentation in ultrasound images, and fungus segmentation in electron microscopy images show that our approach attains superior performance over the best known state-of-the-art weakly supervised DL method and is able to achieve (1) nearly the same accuracy compared to fully supervised DL methods with far less annotation effort, (2) significantly better results with similar annotation time, and (3) robust performance in various applications.
△ Less
Submitted 2 June, 2018;
originally announced June 2018.
-
Detecting Homoglyph Attacks with a Siamese Neural Network
Authors:
Jonathan Woodbridge,
Hyrum S. Anderson,
Anjum Ahuja,
Daniel Grant
Abstract:
A homoglyph (name spoofing) attack is a common technique used by adversaries to obfuscate file and domain names. This technique creates process or domain names that are visually similar to legitimate and recognized names. For instance, an attacker may create malware with the name svch0st.exe so that in a visual inspection of running processes or a directory listing, the process or file name might…
▽ More
A homoglyph (name spoofing) attack is a common technique used by adversaries to obfuscate file and domain names. This technique creates process or domain names that are visually similar to legitimate and recognized names. For instance, an attacker may create malware with the name svch0st.exe so that in a visual inspection of running processes or a directory listing, the process or file name might be mistaken as the Windows system process svchost.exe. There has been limited published research on detecting homoglyph attacks. Current approaches rely on string comparison algorithms (such as Levenshtein distance) that result in computationally heavy solutions with a high number of false positives. In addition, there is a deficiency in the number of publicly available datasets for reproducible research, with most datasets focused on phishing attacks, in which homoglyphs are not always used. This paper presents a fundamentally different solution to this problem using a Siamese convolutional neural network (CNN). Rather than leveraging similarity based on character swaps and deletions, this technique uses a learned metric on strings rendered as images: a CNN learns features that are optimized to detect visual similarity of the rendered strings. The trained model is used to convert thousands of potentially targeted process or domain names to feature vectors. These feature vectors are indexed using randomized KD-Trees to make similarity searches extremely fast with minimal computational processing. This technique shows a considerable 13% to 45% improvement over baseline techniques in terms of area under the receiver operating characteristic curve (ROC AUC). In addition, we provide both code and data to further future research.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Probing Physics Knowledge Using Tools from Developmental Psychology
Authors:
Luis Piloto,
Ari Weinstein,
Dhruva TB,
Arun Ahuja,
Mehdi Mirza,
Greg Wayne,
David Amos,
Chia-chun Hung,
Matt Botvinick
Abstract:
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract ge…
▽ More
In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract general physical concepts directly from sensory data. In the latter case, one challenge that arises is evaluating the learning system. Research on intuitive physics knowledge in children has long employed a violation of expectations (VOE) method to assess children's mastery of specific physical concepts. We take the novel step of applying this method to artificial learning systems. In addition to introducing the VOE technique, we describe a set of probe datasets inspired by classic test stimuli from developmental psychology. We test a baseline deep learning system on this battery, as well as on a physics learning dataset ("IntPhys") recently posed by another research group. Our results show how the VOE technique may provide a useful tool for tracking physics knowledge in future research.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Unsupervised Predictive Memory in a Goal-Directed Agent
Authors:
Greg Wayne,
Chia-Chun Hung,
David Amos,
Mehdi Mirza,
Arun Ahuja,
Agnieszka Grabska-Barwinska,
Jack Rae,
Piotr Mirowski,
Joel Z. Leibo,
Adam Santoro,
Mevlana Gemici,
Malcolm Reynolds,
Tim Harley,
Josh Abramson,
Shakir Mohamed,
Danilo Rezende,
David Saxton,
Adam Cain,
Chloe Hillier,
David Silver,
Koray Kavukcuoglu,
Matt Botvinick,
Demis Hassabis,
Timothy Lillicrap
Abstract:
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l…
▽ More
Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Intention Games: Towards Strategic Coexistence between Partially Honest and Blind Players
Authors:
Aditya Ahuja
Abstract:
Strategic interactions between competitive entities are generally considered from the perspective of complete revelation of benefits achieved from those interactions, in the form of public payoff functions and/or beliefs, in the announced games. However, there exist strategic interplays between competitors where the players have a choice to strategise under the availability of private payoffs, in…
▽ More
Strategic interactions between competitive entities are generally considered from the perspective of complete revelation of benefits achieved from those interactions, in the form of public payoff functions and/or beliefs, in the announced games. However, there exist strategic interplays between competitors where the players have a choice to strategise under the availability of private payoffs, in similar competitive settings. In this contribution, we propose a formal framework for a competitive ecosystem where each player is permitted to defect from publicly optimal strategies under certain private payoffs greater than announced payoffs, given that these defections have certain acceptable bounds in the long run as agreed by all players. We call this game theoretic construction an Intention Game. We formally define an Intention Game, and notions of participational equilibria that exist in such interactions that permit public defections. We compare Intention Games with conventional strategic form games, and demonstrate a type-theoretic construction of Intention Games. In a partially honest setting, we give Intention Game instances of a Cournot competition, secure interactions between mobile applications, an Internet services' data sourcing competition between Internet service providers through content delivery networks, and a Bitcoin mining competition. We give a use of Intention Games to determine player participation in a cryptographic protocol. Finally, we demonstrate the possibility of a dual model of the Intention Games framework.
△ Less
Submitted 11 February, 2020; v1 submitted 26 December, 2017;
originally announced December 2017.
-
A Quantum-Classical Scheme towards Quantum Functional Encryption
Authors:
Aditya Ahuja
Abstract:
Quantum encryption is a well studied problem for both classical and quantum information. However, little is known about quantum encryption schemes which enable the user, under different keys, to learn different functions of the plaintext, given the ciphertext. In this paper, we give a novel one-bit secret-key quantum encryption scheme, a classical extension of which allows different key holders to…
▽ More
Quantum encryption is a well studied problem for both classical and quantum information. However, little is known about quantum encryption schemes which enable the user, under different keys, to learn different functions of the plaintext, given the ciphertext. In this paper, we give a novel one-bit secret-key quantum encryption scheme, a classical extension of which allows different key holders to learn different length subsequences of the plaintext from the ciphertext. We prove our quantum-classical scheme secure under the notions of quantum semantic security, quantum entropic indistinguishability, and recent security definitions from the field of functional encryption.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Predicting Domain Generation Algorithms with Long Short-Term Memory Networks
Authors:
Jonathan Woodbridge,
Hyrum S. Anderson,
Anjum Ahuja,
Daniel Grant
Abstract:
Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered…
▽ More
Various families of malware use domain generation algorithms (DGAs) to generate a large number of pseudo-random domain names to connect to a command and control (C&C) server. In order to block DGA C&C traffic, security organizations must first discover the algorithm by reverse engineering malware samples, then generating a list of domains for a given seed. The domains are then either preregistered or published in a DNS blacklist. This process is not only tedious, but can be readily circumvented by malware authors using a large number of seeds in algorithms with multivariate recurrence properties (e.g., banjori) or by using a dynamic list of seeds (e.g., bedep). Another technique to stop malware from using DGAs is to intercept DNS queries on a network and predict whether domains are DGA generated. Such a technique will alert network administrators to the presence of malware on their networks. In addition, if the predictor can also accurately predict the family of DGAs, then network administrators can also be alerted to the type of malware that is on their networks. This paper presents a DGA classifier that leverages long short-term memory (LSTM) networks to predict DGAs and their respective families without the need for a priori feature extraction. Results are significantly better than state-of-the-art techniques, providing 0.9993 area under the receiver operating characteristic curve for binary classification and a micro-averaged F1 score of 0.9906. In other terms, the LSTM technique can provide a 90% detection rate with a 1:10000 false positive (FP) rate---a twenty times FP improvement over comparable methods. Experiments in this paper are run on open datasets and code snippets are provided to reproduce the results.
△ Less
Submitted 2 November, 2016;
originally announced November 2016.
-
Stochastic Characteristics and Simulation of the Random Waypoint Mobility Model
Authors:
A. Ahuja,
K. Venkateswarlu,
P. Venkata Krishna
Abstract:
Simulation results for Mobile Ad-Hoc Networks (MANETs) are fundamentally governed by the underlying Mobility Model. Thus it is imperative to find whether events functionally dependent on the mobility model 'converge' to well defined functions or constants. This shall ensure the long-run consistency among simulation performed by disparate parties. This paper reviews a work on the discrete Random Wa…
▽ More
Simulation results for Mobile Ad-Hoc Networks (MANETs) are fundamentally governed by the underlying Mobility Model. Thus it is imperative to find whether events functionally dependent on the mobility model 'converge' to well defined functions or constants. This shall ensure the long-run consistency among simulation performed by disparate parties. This paper reviews a work on the discrete Random Waypoint Mobility Model (RWMM), addressing its long run stochastic stability. It is proved that each model in the targeted discrete class of the RWMM satisfies Birkhoff's pointwise ergodic theorem [13], and hence time averaged functions on the mobility model surely converge. We also simulate the most common and general version of the RWMM to give insight into its working.
△ Less
Submitted 18 March, 2012;
originally announced March 2012.