Search | arXiv e-print repository

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space.Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input. Project page: avdit2024.github.io △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2404.14701 [pdf, other]

Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization

Authors: Siqi Feng, Rui Yao, Stephane Hess, Ricardo A. Daziano, Timothy Brathwaite, Joan Walker, Shenhao Wang

Abstract: Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framewor… ▽ More Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity. The proposed framework is applied to travel survey data from Chicago and London to examine the trade-off between predictive power and behavioral regularity for large vs. small sample scenarios and in-domain vs. out-of-domain generalizations. The results demonstrate that, unlike models with strong behavioral foundations such as the multinomial logit, the benchmark DNNs cannot guarantee behavioral regularity. However, gradient regularization (GR) increases DNNs' behavioral regularity by around 6 percentage points (pp) while retaining their relatively high predictive power. In the small sample scenario, GR is more effective than in the large sample scenario, simultaneously improving behavioral regularity by about 20 pp and log-likelihood by around 1.7%. Comparing with the in-domain generalization of DNNs, GR works more effectively in out-of-domain generalization: it drastically improves the behavioral regularity of poorly performing benchmark DNNs by around 65 pp, indicating the criticality of behavioral regularization for enhancing model transferability and application in forecasting. Moreover, the proposed framework is applicable to other NN-based choice models such as TasteNets. Future studies could use behavioral regularity as a metric along with log-likelihood in evaluating travel demand models, and investigate other methods to further enhance behavioral regularity when adopting complex machine learning models. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.17139 [pdf, other]

Video as the New Language for Real-World Decision Making

Authors: Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Abstract: Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that… ▽ More Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning. We identify major impact opportunities in domains such as robotics, self-driving, and science, supported by recent work that demonstrates how such advanced capabilities in video generation are plausibly within reach. Lastly, we identify key challenges in video generation that mitigate progress. Addressing these challenges will enable video generation models to demonstrate unique value alongside language models in a wider array of AI applications. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.09947 [pdf, other]

Prompting Datasets: Data Discovery with Conversational Agents

Authors: Johanna Walker, Elisavet Koutsiana, Joe Massey, Gefion Thuermer, Elena Simperl

Abstract: Can large language models assist in data discovery? Data discovery predominantly happens via search on a data portal or the web, followed by assessment of the dataset to ensure it is fit for the intended purpose. The ability of conversational generative AI (CGAI) to support recommendations with reasoning implies it can suggest datasets to users, explain why it has done so, and provide information… ▽ More Can large language models assist in data discovery? Data discovery predominantly happens via search on a data portal or the web, followed by assessment of the dataset to ensure it is fit for the intended purpose. The ability of conversational generative AI (CGAI) to support recommendations with reasoning implies it can suggest datasets to users, explain why it has done so, and provide information akin to documentation regarding the dataset in order to support a use decision. We hold 3 workshops with data users and find that, despite limitations around web capabilities, CGAIs are able to suggest relevant datasets and provide many of the required sensemaking activities, as well as support dataset analysis and manipulation. However, CGAIs may also suggest fictional datasets, and perform inaccurate analysis. We identify emerging practices in data discovery and present a model of these to inform future research directions and data prompt design. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 27 pages, 9 figures

arXiv:2307.15494 [pdf, other]

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

Authors: Kevin Denamganaï, Daniel Hernandez, Ozan Vardal, Sondess Missaoui, James Alfred Walker

Abstract: Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning w… ▽ More Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning with Hindsight Experience Replay (HER) to deal with sparse rewards environments. Yet, like HER, HIGhER relies on an oracle predicate function to provide a feedback signal highlighting which linguistic description is valid for which state. This reliance on an oracle limits its application. Additionally, HIGhER only leverages the linguistic information contained in successful RL trajectories, thus hurting its final performance and data-efficiency. Without early successful trajectories, HIGhER is no better than DQN upon which it is built. In this paper, we propose the Emergent Textual Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses both of its limitations by means of (i) a discriminative visual referential game, commonly studied in the subfield of Emergent Communication (EC), used here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to align the emergent language with the natural language of the instruction-following benchmark. We show that the referential game's agents make an artificial language emerge that is aligned with the natural-like language used to describe goals in the BabyAI benchmark and that it is expressive enough so as to also describe unsuccessful RL trajectories and thus provide feedback to the RL agent to leverage the linguistic, structured information contained in all trajectories. Our work shows that EC is a viable unsupervised auxiliary task for RL and provides missing pieces to make HER more widely applicable. △ Less

Submitted 17 December, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: work in progress

arXiv:2307.09342 [pdf, other]

doi 10.1007/s10601-023-09364-1

Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints

Authors: Felix Ulrich-Oltean, Peter Nightingale, James Alfred Walker

Abstract: Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of select… ▽ More Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of selecting encodings for pseudo-Boolean and linear constraints using a supervised machine learning approach. We show that it is possible to select encodings effectively using a standard set of features for constraint problems; however we obtain better performance with a new set of features specifically designed for the pseudo-Boolean and linear constraints. In fact, we achieve good results when selecting encodings for unseen problem classes. Our results compare favourably to AutoFolio when using the same feature set. We discuss the relative importance of instance features to the task of selecting the best encodings, and compare several variations of the machine learning method. △ Less

Submitted 8 November, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 24 pages, 10 figures, accapted by Constraints Journal (Springer, 2023)

arXiv:2306.06194 [pdf, other]

Public Transit Demand Prediction During Highly Dynamic Conditions: A Meta-Analysis of State-of-the-Art Models and Open-Source Benchmarking Infrastructure

Authors: Juan D. Caicedo, Marta C. González, Joan L. Walker

Abstract: Real-time demand prediction is a critical input for dynamic bus routing. While many researchers have developed numerous complex methods to predict short-term transit demand, the applications have been limited to short, stable time frames and a few stations. How these methods perform in highly dynamic environments has not been studied, nor has their performance been systematically compared. We buil… ▽ More Real-time demand prediction is a critical input for dynamic bus routing. While many researchers have developed numerous complex methods to predict short-term transit demand, the applications have been limited to short, stable time frames and a few stations. How these methods perform in highly dynamic environments has not been studied, nor has their performance been systematically compared. We built an open-source infrastructure with five common methodologies, including econometric and deep learning approaches, and assessed their performance under stable and highly dynamic conditions. We used a time series from smartcard data to predict demand for the following day for the BRT system in Bogota, Colombia. The dynamic conditions in the time series include a month-long protest and the COVID-19 pandemic. Both conditions triggered drastic shifts in demand. The results reveal that most tested models perform similarly in stable conditions, with MAAPE varying from 0.08 to 0.12. The benchmark demonstrated that all models performed significantly worse in both dynamic conditions compared to the stable conditions. In the month-long protest, the increased MAAPE ranged from 0.14 to 0.24. Similarly, during the COVID-19 pandemic, the increased MAAPE ranged from 0.12 to 0.82. Notably, in the COVID-19 pandemic condition, an LSTM model with adaptive training and a multi-output design outperformed other models, adapting faster to disruptions. The prediction error stabilized within approximately 1.5 months, whereas other models continued to exhibit higher error rates even a year after the start of the pandemic. The aim of this open-source codebase infrastructure is to lower the barrier for other researchers to replicate and reproduce models, facilitate a collective effort within the research community to improve the benchmarking process and accelerate the advancement of short-term ridership prediction models. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 17 pages, 8 figures

arXiv:2306.05562 [pdf, other]

AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs

Authors: Adam D. Cobb, Anirban Roy, Daniel Elenius, F. Michael Heim, Brian Swenson, Sydney Whittington, James D. Walker, Theodore Bapty, Joseph Hite, Karthik Ramani, Christopher McComb, Susmit Jha

Abstract: We present AircraftVerse, a publicly available aerial vehicle design dataset. Aircraft design encompasses different physics domains and, hence, multiple modalities of representation. The evaluation of these cyber-physical system (CPS) designs requires the use of scientific analytical and simulation models ranging from computer-aided design tools for structural and manufacturing analysis, computati… ▽ More We present AircraftVerse, a publicly available aerial vehicle design dataset. Aircraft design encompasses different physics domains and, hence, multiple modalities of representation. The evaluation of these cyber-physical system (CPS) designs requires the use of scientific analytical and simulation models ranging from computer-aided design tools for structural and manufacturing analysis, computational fluid dynamics tools for drag and lift computation, battery models for energy estimation, and simulation models for flight control and dynamics. AircraftVerse contains 27,714 diverse air vehicle designs - the largest corpus of engineering designs with this level of complexity. Each design comprises the following artifacts: a symbolic design tree describing topology, propulsion subsystem, battery subsystem, and other design details; a STandard for the Exchange of Product (STEP) model data; a 3D CAD design using a stereolithography (STL) file format; a 3D point cloud for the shape of the design; and evaluation results from high fidelity state-of-the-art physics models that characterize performance metrics such as maximum flight distance and hover-time. We also present baseline surrogate models that use different modalities of design representation to predict design performance metrics, which we provide as part of our dataset release. Finally, we discuss the potential impact of this dataset on the use of learning in aircraft design and, more generally, in CPS. AircraftVerse is accompanied by a data card, and it is released under Creative Commons Attribution-ShareAlike (CC BY-SA) license. The dataset is hosted at https://zenodo.org/record/6525446, baseline models and code at https://github.com/SRI-CSL/AircraftVerse, and the dataset description at https://aircraftverse.onrender.com/. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: The dataset is hosted at https://zenodo.org/record/6525446, baseline models and code at https://github.com/SRI-CSL/AircraftVerse, and the dataset description at https://aircraftverse.onrender.com/

arXiv:2305.18477 [pdf, ps, other]

doi 10.1609/aiide.v19i1.27507

Beyond the Meta: Leveraging Game Design Parameters for Patch-Agnostic Esport Analytics

Authors: Alan Pedrassoli Chitayat, Florian Block, James Walker, Anders Drachen

Abstract: Esport games comprise a sizeable fraction of the global games market, and is the fastest growing segment in games. This has given rise to the domain of esports analytics, which uses telemetry data from games to inform players, coaches, broadcasters and other stakeholders. Compared to traditional sports, esport titles change rapidly, in terms of mechanics as well as rules. Due to these frequent cha… ▽ More Esport games comprise a sizeable fraction of the global games market, and is the fastest growing segment in games. This has given rise to the domain of esports analytics, which uses telemetry data from games to inform players, coaches, broadcasters and other stakeholders. Compared to traditional sports, esport titles change rapidly, in terms of mechanics as well as rules. Due to these frequent changes to the parameters of the game, esport analytics models can have a short life-spam, a problem which is largely ignored within the literature. This paper extracts information from game design (i.e. patch notes) and utilises clustering techniques to propose a new form of character representation. As a case study, a neural network model is trained to predict the number of kills in a Dota 2 match utilising this novel character representation technique. The performance of this model is then evaluated against two distinct baselines, including conventional techniques. Not only did the model significantly outperform the baselines in terms of accuracy (85% AUC), but the model also maintains the accuracy in two newer iterations of the game that introduced one new character and a brand new character type. These changes introduced to the design of the game would typically break conventional techniques that are commonly used within the literature. Therefore, the proposed methodology for representing characters can increase the life-spam of machine learning models as well as contribute to a higher performance when compared to traditional techniques typically employed within the literature. △ Less

Submitted 16 August, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2304.14511 [pdf, other]

Visual Referential Games Further the Emergence of Disentangled Representations

Authors: Kevin Denamganaï, Sondess Missaoui, James Alfred Walker

Abstract: Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations… ▽ More Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations that generalise well in deep learning, and is thought to be a necessary condition to enable systematicity. Thus, this paper investigates how do compositionality at the level of the emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games. Firstly, we find that visual referential games that are based on the Obverter architecture outperforms state-of-the-art unsupervised learning approach in terms of many major disentanglement metrics. Secondly, we expand the previously proposed Positional Disentanglement (PosDis) metric for compositionality to (re-)incorporate some concerns pertaining to informativeness and completeness features found in the Mutual Information Gap (MIG) disentanglement metric it stems from. This extension allows for further discrimination between the different kind of compositional languages that emerge in the context of Obverter-based referential games, in a way that neither the referential game accuracy nor previous metrics were able to capture. Finally we investigate whether the resulting (emergent) systematicity, as measured by zero-shot compositional learning tests, correlates with any of the disentanglement and compositionality metrics proposed so far. Throughout the training process, statically significant correlation coefficients can be found both positive and negative depending on the moment of the measure. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Rejected from NeurIPS 2021 (scores of 6,6,4, and 4) / still work in progress

arXiv:2303.15663 [pdf]

Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing

Authors: Ankita Agarwal, Tanvi Banerjee, Joy Gockel, Saniya LeBlanc, Joe Walker, John Middendorf

Abstract: An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric mat… ▽ More An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric materials, the power factor is a measure of how efficiently the material can convert heat to electricity. While earlier works have predicted the material characterization properties of different thermoelectric materials using various techniques, implementation of machine learning models to predict the power factor of bismuth telluride (Bi2Te3) during the AM process has not been explored. This is important as Bi2Te3 is a standard material for low temperature applications. Thus, we used data about manufacturing processing parameters involved and in-situ sensor monitoring data collected during AM of Bi2Te3, to train different machine learning models in order to predict its thermoelectric power factor. We implemented supervised machine learning techniques using 80% training and 20% test data and further used the permutation feature importance method to identify important processing parameters and in-situ sensor features which were best at predicting power factor of the material. Ensemble-based methods like random forest, AdaBoost classifier, and bagging classifier performed the best in predicting power factor with the highest accuracy of 90% achieved by the bagging classifier model. Additionally, we found the top 15 processing parameters and in-situ sensor features to characterize the material manufacturing property like power factor. These features could further be optimized to maximize power factor of the thermoelectric material and improve the quality of the products built using this material. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 8 pages, 2 figures, 2 tables, accepted at Data Science for Smart Manufacturing and Healthcare workshop (DS2-MH) at SIAM International Conference on Data Mining (SDM23) conference

arXiv:2303.04204 [pdf, other]

doi 10.1016/j.trb.2023.102869

Deep hybrid model with satellite imagery: how to combine demand modeling and computer vision for behavior analysis?

Authors: Qingyi Wang, Shenhao Wang, Yunhan Zheng, Hongzhou Lin, Xiaohu Zhang, **hua Zhao, Joan Walker

Abstract: Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid model… ▽ More Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid models with a crossing structure consisting of a mixing operator and a behavioral predictor, thus integrating the numeric and imagery data into a latent space. Empirically, this framework is applied to analyze travel mode choice using the MyDailyTravel Survey from Chicago as the numeric inputs and the satellite images as the imagery inputs. We found that deep hybrid models outperform both the traditional demand models and the recent deep learning in predicting the aggregate and disaggregate travel behavior with our supervision-as-mixing design. The latent space in deep hybrid models can be interpreted, because it reveals meaningful spatial and social patterns. The deep hybrid models can also generate new urban images that do not exist in reality and interpret them with economic theory, such as computing substitution patterns and social welfare changes. Overall, the deep hybrid models demonstrate the complementarity between the low-dimensional numeric and high-dimensional imagery data and between the traditional demand modeling and recent deep learning. It generalizes the latent classes and variables in classical hybrid demand models to a latent space, and leverages the computational power of deep learning for imagery while retaining the economic interpretability on the microeconomics foundation. △ Less

Submitted 22 February, 2024; v1 submitted 7 March, 2023; originally announced March 2023.

Journal ref: Transportation Research Part B: Methodological, Volume 179, 2024, 102869

arXiv:2302.12458 [pdf, other]

Design and Mechanics of Cable-Driven Rolling Diaphragm Transmission for High-Transparency Robotic Motion

Authors: Hoi Man Lam, W. Jared Walker, Lucas Jonasch, Dimitri Schreiber, Michael C. Yip

Abstract: Applications of rolling diaphragm transmissions for medical and teleoperated robotics are of great interest, due to the low friction of rolling diaphragms combined with the power density and stiffness of hydraulic transmissions. However, the stiffness-enabling pressure preloads can form a tradeoff against bearing loading in some rolling diaphragm layouts, and transmission setup can be difficult. U… ▽ More Applications of rolling diaphragm transmissions for medical and teleoperated robotics are of great interest, due to the low friction of rolling diaphragms combined with the power density and stiffness of hydraulic transmissions. However, the stiffness-enabling pressure preloads can form a tradeoff against bearing loading in some rolling diaphragm layouts, and transmission setup can be difficult. Utilization of cable drives compliment the rolling diaphragm transmission's advantages, but maintaining cable tension is crucial for optimal and consistent performance. In this paper, a coaxial opposed rolling diaphragm layout with cable drive and an electronic transmission control system are investigated, with a focus on system reliability and scalability. Mechanical features are proposed which enable force balancing, decoupling of transmission pressure from bearing loads, and maintenance of cable tension. Key considerations and procedures for automation of transmission setup, phasing, and operation are also presented. We also present an analysis of system stiffness to identify key compliance contributors, and conduct experiments to validate prototype design performance. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 7 pages, 13 figures

arXiv:2302.04009 [pdf, other]

Investigating the role of model-based learning in exploration and transfer

Authors: Jacob Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Théophane Weber, Jessica B. Hamrick

Abstract: State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribu… ▽ More State of the art reinforcement learning has enabled training agents on tasks of ever increasing complexity. However, the current paradigm tends to favor training agents from scratch on every new task or on collections of tasks with a view towards generalizing to novel task configurations. The former suffers from poor data efficiency while the latter is difficult when test tasks are out-of-distribution. Agents that can effectively transfer their knowledge about the world pose a potential solution to these issues. In this paper, we investigate transfer learning in the context of model-based agents. Specifically, we aim to understand when exactly environment models have an advantage and why. We find that a model-based approach outperforms controlled model-free baselines for transfer learning. Through ablations, we show that both the policy and dynamics model learnt through exploration matter for successful transfer. We demonstrate our results across three domains which vary in their requirements for transfer: in-distribution procedural (Crafter), in-distribution identical (RoboDesk), and out-of-distribution (Meta-World). Our results show that intrinsic exploration combined with environment models present a viable direction towards agents that are self-supervised and able to generalize to novel reward functions. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.05158 [pdf, other]

SemPPL: Predicting pseudo-labels for better contrastive representations

Authors: Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

Abstract: Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shape… ▽ More Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shaped by distinguishing whether two samples represent the same underlying datum (positives) or not (negatives) -- with a novel approach to selecting positives. To enrich the set of positives, we leverage the few existing ground-truth labels to predict the missing ones through a $k$-nearest neighbours classifier by using the learned embeddings of the labelled data. We thus extend the set of positives with datapoints having the same pseudo-label and call these semantic positives. We jointly learn the representation and predict bootstrapped pseudo-labels. This creates a reinforcing cycle. Strong initial representations enable better pseudo-label predictions which then improve the selection of semantic positives and lead to even better representations. SemPPL outperforms competing semi-supervised methods setting new state-of-the-art performance of $68.5\%$ and $76\%$ top-$1$ accuracy when using a ResNet-$50$ and training on $1\%$ and $10\%$ of labels on ImageNet, respectively. Furthermore, when using selective kernels, SemPPL significantly outperforms previous state-of-the-art achieving $72.3\%$ and $78.3\%$ top-$1$ accuracy on ImageNet with $1\%$ and $10\%$ labels, respectively, which improves absolute $+7.8\%$ and $+6.2\%$ over previous work. SemPPL also exhibits state-of-the-art performance over larger ResNet models as well as strong robustness, out-of-distribution and transfer performance. We release the checkpoints and the evaluation code at https://github.com/deepmind/semppl . △ Less

Submitted 10 January, 2024; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: Published as a conference paper at ICLR 2023. For checkpoints and source code see https://github.com/google-deepmind/semppl

arXiv:2212.01334 [pdf, other]

A Mixed-Method Approach to Determining Contact Matrices in the Cox's Bazar Refugee Settlement

Authors: Joseph Walker, Joseph Aylett-Bullock, Difu Shi, Allen Gidraf Kahindo Maina, Egmond Samir Evers, Sandra Harlass, Frank Krauss

Abstract: Contact matrices are an important ingredient in age-structured epidemic models to inform the simulated spread of the disease between sub-groups of the population. These matrices are generally derived using resource-intensive diary-based surveys and few exist in the Global South or tailored to vulnerable populations. In particular, no contact matrices exist for refugee settlements - locations under… ▽ More Contact matrices are an important ingredient in age-structured epidemic models to inform the simulated spread of the disease between sub-groups of the population. These matrices are generally derived using resource-intensive diary-based surveys and few exist in the Global South or tailored to vulnerable populations. In particular, no contact matrices exist for refugee settlements - locations under-served by epidemic models in general. In this paper we present a novel, mixed-method approach, for deriving contact matrices in populations which combines a lightweight, rapidly deployable, survey with an agent-based model of the population informed by census and behavioural data. We use this method to derive the first set of contact matrices for the Cox's Bazar refugee settlement in Bangladesh. The matrices from the refugee settlement show strong banding effects due to different age cut-offs in attendance at certain venues, such as distribution centres and religious sites, as well as the important contribution of the demographic profile of the settlement which was encoded in the model. These can have significant implications to the modelled disease dynamics. To validate our approach, we also apply our method to the population of the UK and compare our derived matrices against well-known contact matrices previously collected using traditional approaches. Overall, our findings demonstrate that our mixed-method approach can address some of the challenges of both the traditional and previously proposed agent-based approaches to deriving contact matrices, and has the potential to be rolled-out in other resource-constrained environments. This work therefore contributes to a broader aim of develo** new methods and mechanisms of data collection for modelling disease spread in refugee and IDP settlements and better serving these vulnerable communities. △ Less

Submitted 22 November, 2022; originally announced December 2022.

Comments: 31 pages with appendices, 18 figures

arXiv:2209.11805 [pdf]

Tracking the State and Behavior of People in Response to COVID-1 19 Through the Fusion of Multiple Longitudinal Data Streams

Authors: Mohamed Amine Bouzaghrane, Hassan Obeid, Drake Hayes, Minnie Chen, Meiqing Li, Madeleine Parker, Daniel A. Rodríguez, Daniel G. Chatman, Karen Trapenberg Frick, Raja Sengupta, Joan Walker

Abstract: The changing nature of the COVID-19 pandemic has highlighted the importance of comprehensively considering its impacts and considering changes over time. Most COVID-19 related research addresses narrowly focused research questions and is therefore limited in addressing the complexities created by the interrelated impacts of the pandemic. Such research generally makes use of only one of either 1) a… ▽ More The changing nature of the COVID-19 pandemic has highlighted the importance of comprehensively considering its impacts and considering changes over time. Most COVID-19 related research addresses narrowly focused research questions and is therefore limited in addressing the complexities created by the interrelated impacts of the pandemic. Such research generally makes use of only one of either 1) actively collected data such as surveys, or 2) passively collected data. While a few studies make use of both actively and passively collected data, only one other study collects it longitudinally. Here we describe a rich panel dataset of active and passive data from U.S. residents collected between August 2020 and July 2021. Active data includes a repeated survey measuring travel behavior, compliance with COVID-19 mandates, physical health, economic well-being, vaccination status, and other factors. Passively collected data consists of all locations visited by study participants, taken from smartphone GPS data. We also closely tracked COVID-19 policies across counties of residence throughout the study period. Such a dataset allows important research questions to be answered; for example, to determine the factors underlying the heterogeneous behavioral responses to COVID-19 restrictions imposed by local governments. Better information about such responses is critical to our ability to understand the societal and economic impacts of this and future pandemics. The development of this data infrastructure can also help researchers explore new frontiers in behavioral science. The article explains how this approach fills gaps in COVID-19 related data collection; describes the study design and data collection procedures; presents key demographic characteristics of study participants; and shows how fusing different data streams helps uncover behavioral insights. △ Less

Submitted 1 October, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

arXiv:2207.14394 [pdf, other]

Logic and Accuracy Testing: A Fifty-State Review

Authors: Josiah Walker, Nakul Bajaj, Braden L. Crimmins, J. Alex Halderman

Abstract: Pre-election logic and accuracy (L&A) testing is a process in which election officials validate the behavior of voting equipment by casting a known set of test ballots and confirming the expected results. Ideally, such testing can serve to detect certain forms of human error or fraud and help bolster voter confidence. We present the first detailed analysis of L&A testing practices across the Unite… ▽ More Pre-election logic and accuracy (L&A) testing is a process in which election officials validate the behavior of voting equipment by casting a known set of test ballots and confirming the expected results. Ideally, such testing can serve to detect certain forms of human error or fraud and help bolster voter confidence. We present the first detailed analysis of L&A testing practices across the United States. We find that while all states require L&A testing before every election, their implementations vary dramatically in scope, transparency, and rigorousness. We summarize each state's requirements and score them according to uniform criteria. We also highlight best practices and flag opportunities for improvement, in hopes of encouraging broader adoption of more effective L&A processes. △ Less

Submitted 1 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 27 pages, 4 figures, to be published in E-Vote-ID: Seventh International Joint Conference on Electronic Voting

arXiv:2207.08012 [pdf, other]

Meta-Referential Games to Learn Compositional Learning Behaviours

Authors: Kevin Denamganaï, Sondess Missaoui, James Alfred Walker

Abstract: Human beings use compositionality to generalise from past experiences to novel experiences. We assume a separation of our experiences into fundamental atomic components that can be recombined in novel ways to support our ability to engage with novel experiences. We frame this as the ability to learn to generalise compositionally, and we will refer to behaviours making use of this ability as compos… ▽ More Human beings use compositionality to generalise from past experiences to novel experiences. We assume a separation of our experiences into fundamental atomic components that can be recombined in novel ways to support our ability to engage with novel experiences. We frame this as the ability to learn to generalise compositionally, and we will refer to behaviours making use of this ability as compositional learning behaviours (CLBs). A central problem to learning CLBs is the resolution of a binding problem (BP). While it is another feat of intelligence that human beings perform with ease, it is not the case for state-of-the-art artificial agents. Thus, in order to build artificial agents able to collaborate with human beings, we propose to develop a novel benchmark to investigate agents' abilities to exhibit CLBs by solving a domain-agnostic version of the BP. We take inspiration from the language emergence and grounding framework of referential games and propose a meta-learning extension of referential games, entitled Meta-Referential Games, and use this framework to build our benchmark, the Symbolic Behaviour Benchmark (S2B). We provide baseline results and error analysis showing that our benchmark is a compelling challenge that we hope will spur the research community towards develo** more capable artificial agents. △ Less

Submitted 19 December, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

Comments: work in progress

arXiv:2205.03481 [pdf, other]

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

Authors: Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein

Abstract: Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio. Previous work has shown that a neural AEC model operating on log-mel spectral features (denoted "logmel" hereafter) can greatly improve Automatic Speech Recognition (ASR) accuracy when optimized with an auxiliary loss utilizing a pre-trained ASR model encoder. In t… ▽ More Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio. Previous work has shown that a neural AEC model operating on log-mel spectral features (denoted "logmel" hereafter) can greatly improve Automatic Speech Recognition (ASR) accuracy when optimized with an auxiliary loss utilizing a pre-trained ASR model encoder. In this paper, we develop a conformer-based waveform-domain neural AEC model inspired by the "TasNet" architecture. The model is trained by jointly optimizing Negative Scale-Invariant SNR (SISNR) and ASR losses on a large speech dataset. On a realistic rerecorded test set, we find that cascading a linear adaptive AEC and a waveform-domain neural AEC is very effective, giving 56-59% word error rate (WER) reduction over the linear AEC alone. On this test set, the 1.6M parameter waveform-domain neural AEC also improves over a larger 6.5M parameter logmel-domain neural AEC model by 20-29% in easy to moderate conditions. By operating on smaller frames, the waveform neural model is able to perform better at smaller sizes and is better suited for applications where memory is limited. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: Submitted to Interspeech 2022

arXiv:2204.12092 [pdf, other]

Mask scalar prediction for improving robust automatic speech recognition

Authors: Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi

Abstract: Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech. Time-frequency masking based approaches have been shown to work well, but they need additional hyper-parameters to scale the mask to limit sp… ▽ More Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech. Time-frequency masking based approaches have been shown to work well, but they need additional hyper-parameters to scale the mask to limit speech distortion. Such mask scalars are typically hand-tuned and chosen conservatively. In this work, we present a technique to predict mask scalars using an ASR-based loss in an end-to-end fashion, with minimal increase in the overall model size and complexity. We evaluate the approach on two robust ASR tasks: multichannel enhancement in the presence of speech and non-speech noise, and acoustic echo cancellation (AEC). Results show that the presented algorithm consistently improves word error rate (WER) without the need for any additional tuning over strong baselines that use hand-tuned hyper-parameters: up to 16% for multichannel enhancement in noisy conditions, and up to 7% for AEC. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Submitted to Interspeech 2022

arXiv:2203.09494 [pdf, other]

Transframer: Arbitrary Frame Prediction with Generative Models

Authors: Charlie Nash, João Carreira, Jacob Walker, Iain Barr, Andrew Jaegle, Mateusz Malinowski, Peter Battaglia

Abstract: We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs s… ▽ More We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data. △ Less

Submitted 9 May, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

arXiv:2111.09935 [pdf, other]

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Authors: Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard

Abstract: We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which… ▽ More We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which is necessary for echo cancellation; (2) a noise context, which is useful for speech enhancement; and (3) an embedding vector representing the voice characteristic of the target speaker of interest, which is not only critical in speech separation, but also helpful for echo cancellation and speech enhancement. We present detailed evaluations to show that the joint model performs almost as well as the task-specific models, and significantly reduces word error rate in noisy conditions even when using a large-scale state-of-the-art ASR model. Compared to the noisy baseline, the joint model reduces the word error rate in low signal-to-noise ratio conditions by at least 71% on our echo cancellation dataset, 10% on our noisy dataset, and 26% on our multi-speaker dataset. Compared to task-specific models, the joint model performs within 10% on our echo cancellation dataset, 2% on the noisy dataset, and 3% on the multi-speaker dataset. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: Will appear in IEEE-ASRU 2021

arXiv:2111.01587 [pdf, other]

Procedural Generalization by Planning with Self-Supervised World Models

Authors: Ankesh Anand, Jacob Walker, Yazhe Li, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick

Abstract: One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ab… ▽ More One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization. Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero (Schrittwieser et al., 2020), a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify three factors of procedural generalization -- planning, self-supervised representation learning, and procedural data diversity -- and show that by combining these techniques, we achieve state-of-the art generalization performance and data efficiency on Procgen (Cobbe et al., 2019). However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World (Yu et al., 2019), indicating that transfer remains a challenge and may require different approaches than procedural generalization. Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2108.01554 [pdf]

doi 10.1371/journal.pone.0255630

Sexing Caucasian 2D footprints using convolutional neural networks

Authors: Marcin Budka, Matthew R. Bennet, Sally Reynolds, Shelby Barefoot, Sarah Reel, Selina Reidy, Jeremy Walker

Abstract: Footprints are left, or obtained, in a variety of scenarios from crime scenes to anthropological investigations. Determining the sex of a footprint can be useful in screening such impressions and attempts have been made to do so using single or multi landmark distances, shape analyses and via the density of friction ridges. Here we explore the relative importance of different components in sexing… ▽ More Footprints are left, or obtained, in a variety of scenarios from crime scenes to anthropological investigations. Determining the sex of a footprint can be useful in screening such impressions and attempts have been made to do so using single or multi landmark distances, shape analyses and via the density of friction ridges. Here we explore the relative importance of different components in sexing two-dimensional foot impressions namely, size, shape and texture. We use a machine learning approach and compare this to more traditional methods of discrimination. Two datasets are used, a pilot data set collected from students at Bournemouth University (N=196) and a larger data set collected by podiatrists at Sheffield NHS Teaching Hospital (N=2677). Our convolutional neural network can sex a footprint with accuracy of around 90% on a test set of N=267 footprint images using all image components, which is better than an expert can achieve. However, the quality of the impressions impacts on this success rate, but the results are promising and in time it may be possible to create an automated screening algorithm in which practitioners of whatever sort (medical or forensic) can obtain a first order sexing of a two-dimensional footprint. △ Less

Submitted 23 July, 2021; originally announced August 2021.

arXiv:2107.13076 [pdf, other]

Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI

Authors: ennifer Chubba, Sondess Missaouib, Shauna Concannonc, Liam Maloneyb, James Alfred Walker

Abstract: Conversational Artificial Intelligence (CAI) systems and Intelligent Personal Assistants (IPA), such as Alexa, Cortana, Google Home and Siri are becoming ubiquitous in our lives, including those of children, the implications of which is receiving increased attention, specifically with respect to the effects of these systems on children's cognitive, social and linguistic development. Recent advance… ▽ More Conversational Artificial Intelligence (CAI) systems and Intelligent Personal Assistants (IPA), such as Alexa, Cortana, Google Home and Siri are becoming ubiquitous in our lives, including those of children, the implications of which is receiving increased attention, specifically with respect to the effects of these systems on children's cognitive, social and linguistic development. Recent advances address the implications of CAI with respect to privacy, safety, security, and access. However, there is a need to connect and embed the ethical and technical aspects in the design. Using a case-study of a research and development project focused on the use of CAI in storytelling for children, this paper reflects on the social context within a specific case of technology development, as substantiated and supported by argumentation from within the literature. It describes the decision making process behind the recommendations made on this case for their adoption in the creative industries. Further research that engages with developers and stakeholders in the ethics of storytelling through CAI is highlighted as a matter of urgency. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.05431 [pdf, other]

CoBERL: Contrastive BERT for Reinforcement Learning

Authors: Andrea Banino, Adrià Puidomenech Badia, Jacob Walker, Tim Scholtes, Jovana Mitrovic, Charles Blundell

Abstract: Many reinforcement learning (RL) agents require a large amount of experience to solve tasks. We propose Contrastive BERT for RL (CoBERL), an agent that combines a new contrastive loss and a hybrid LSTM-transformer architecture to tackle the challenge of improving data efficiency. CoBERL enables efficient, robust learning from pixels across a wide range of domains. We use bidirectional masked predi… ▽ More Many reinforcement learning (RL) agents require a large amount of experience to solve tasks. We propose Contrastive BERT for RL (CoBERL), an agent that combines a new contrastive loss and a hybrid LSTM-transformer architecture to tackle the challenge of improving data efficiency. CoBERL enables efficient, robust learning from pixels across a wide range of domains. We use bidirectional masked prediction in combination with a generalization of recent contrastive methods to learn better representations for transformers in RL, without the need of hand engineered data augmentations. We find that CoBERL consistently improves performance across the full Atari suite, a set of control tasks and a challenging 3D environment. △ Less

Submitted 22 February, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: 9 pages, 2 figures, 6 tables

arXiv:2103.10790 [pdf, other]

Quality Evolvability ES: Evolving Individuals With a Distribution of Well Performing and Diverse Offspring

Authors: Adam Katona, Daniel W. Franks, James Alfred Walker

Abstract: One of the most important lessons from the success of deep learning is that learned representations tend to perform much better at any task compared to representations we design by hand. Yet evolution of evolvability algorithms, which aim to automatically learn good genetic representations, have received relatively little attention, perhaps because of the large amount of computational power they r… ▽ More One of the most important lessons from the success of deep learning is that learned representations tend to perform much better at any task compared to representations we design by hand. Yet evolution of evolvability algorithms, which aim to automatically learn good genetic representations, have received relatively little attention, perhaps because of the large amount of computational power they require. The recent method Evolvability ES allows direct selection for evolvability with little computation. However, it can only be used to solve problems where evolvability and task performance are aligned. We propose Quality Evolvability ES, a method that simultaneously optimizes for task performance and evolvability and without this restriction. Our proposed approach Quality Evolvability has similar motivation to Quality Diversity algorithms, but with some important differences. While Quality Diversity aims to find an archive of diverse and well-performing, but potentially genetically distant individuals, Quality Evolvability aims to find a single individual with a diverse and well-performing distribution of offspring. By doing so Quality Evolvability is forced to discover more evolvable representations. We demonstrate on robotic locomotion control tasks that Quality Evolvability ES, similarly to Quality Diversity methods, can learn faster than objective-based methods and can handle deceptive problems. △ Less

Submitted 20 July, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: 2021 Conference on Artificial Life

arXiv:2103.01950 [pdf, other]

Predicting Video with VQVAE

Authors: Jacob Walker, Ali Razavi, Aäron van den Oord

Abstract: In recent years, the task of video prediction-forecasting future video given past video frames-has attracted attention in the research community. In this paper we propose a novel approach to this problem with Vector Quantized Variational AutoEncoders (VQ-VAE). With VQ-VAE we compress high-resolution videos into a hierarchical set of multi-scale discrete latent variables. Compared to pixels, this c… ▽ More In recent years, the task of video prediction-forecasting future video given past video frames-has attracted attention in the research community. In this paper we propose a novel approach to this problem with Vector Quantized Variational AutoEncoders (VQ-VAE). With VQ-VAE we compress high-resolution videos into a hierarchical set of multi-scale discrete latent variables. Compared to pixels, this compressed latent space has dramatically reduced dimensionality, allowing us to apply scalable autoregressive generative models to predict video. In contrast to previous work that has largely emphasized highly constrained datasets, we focus on very diverse, large-scale datasets such as Kinetics-600. We predict video at a higher resolution on unconstrained videos, 256x256, than any other previous method to our knowledge. We further validate our approach against prior work via a crowdsourced human evaluation. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 13 Pages

ACM Class: I.2.6; I.2.10

arXiv:2101.11948 [pdf]

doi 10.1016/j.jocm.2021.100340

Choice modelling in the age of machine learning -- discussion paper

Authors: S. Van Cranenburgh, S. Wang, A. Vij, F. Pereira, J. Walker

Abstract: Since its inception, the choice modelling field has been dominated by theory-driven modelling approaches. Machine learning offers an alternative data-driven approach for modelling choice behaviour and is increasingly drawing interest in our field. Cross-pollination of machine learning models, techniques and practices could help overcome problems and limitations encountered in the current theory-dr… ▽ More Since its inception, the choice modelling field has been dominated by theory-driven modelling approaches. Machine learning offers an alternative data-driven approach for modelling choice behaviour and is increasingly drawing interest in our field. Cross-pollination of machine learning models, techniques and practices could help overcome problems and limitations encountered in the current theory-driven modelling paradigm, such as subjective labour-intensive search processes for model selection, and the inability to work with text and image data. However, despite the potential benefits of using the advances of machine learning to improve choice modelling practices, the choice modelling field has been hesitant to embrace machine learning. This discussion paper aims to consolidate knowledge on the use of machine learning models, techniques and practices for choice modelling, and discuss their potential. Thereby, we hope not only to make the case that further integration of machine learning in choice modelling is beneficial, but also to further facilitate it. To this end, we clarify the similarities and differences between the two modelling paradigms; we review the use of machine learning for choice modelling; and we explore areas of opportunities for embracing machine learning models and techniques to improve our practices. To conclude this discussion paper, we put forward a set of research questions which must be addressed to better understand if and how machine learning can benefit choice modelling. △ Less

Submitted 24 November, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: 40 pages, 2 tables, 0 figures

Journal ref: Journal of Choice Modelling 42 (2022): 100340

arXiv:2012.10776 [pdf, other]

On (Emergent) Systematic Generalisation and Compositionality in Visual Referential Games with Straight-Through Gumbel-Softmax Estimator

Authors: Kevin Denamganaï, James Alfred Walker

Abstract: The drivers of compositionality in artificial languages that emerge when two (or more) agents play a non-visual referential game has been previously investigated using approaches based on the REINFORCE algorithm and the (Neural) Iterated Learning Model. Following the more recent introduction of the \textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper investigates to what extent th… ▽ More The drivers of compositionality in artificial languages that emerge when two (or more) agents play a non-visual referential game has been previously investigated using approaches based on the REINFORCE algorithm and the (Neural) Iterated Learning Model. Following the more recent introduction of the \textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper investigates to what extent the drivers of compositionality identified so far in the field apply in the ST-GS context and to what extent do they translate into (emergent) systematic generalisation abilities, when playing a visual referential game. Compositionality and the generalisation abilities of the emergent languages are assessed using topographic similarity and zero-shot compositional tests. Firstly, we provide evidence that the test-train split strategy significantly impacts the zero-shot compositional tests when dealing with visual stimuli, whilst it does not when dealing with symbolic ones. Secondly, empirical evidence shows that using the ST-GS approach with small batch sizes and an overcomplete communication channel improves compositionality in the emerging languages. Nevertheless, while shown robust with symbolic stimuli, the effect of the batch size is not so clear-cut when dealing with visual stimuli. Our results also show that not all overcomplete communication channels are created equal. Indeed, while increasing the maximum sentence length is found to be beneficial to further both compositionality and generalisation abilities, increasing the vocabulary size is found detrimental. Finally, a lack of correlation between the language compositionality at training-time and the agents' generalisation abilities is observed in the context of discriminative referential games with visual stimuli. This is similar to previous observations in the field using the generative variant with symbolic stimuli. △ Less

Submitted 19 December, 2020; originally announced December 2020.

Comments: Accepted at 4th NeurIPS Workshop on Emergent Communication (EmeCom @ NeurIPS 2020)

arXiv:2012.09486 [pdf, other]

ReferentialGym: A Nomenclature and Framework for Language Emergence & Grounding in (Visual) Referential Games

Authors: Kevin Denamganaï, James Alfred Walker

Abstract: Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to i… ▽ More Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to investigate language emergence and grounding working towards better human-machine interfaces. For instance, interactive/conversational AI assistants that are able to relate their vision to the ongoing conversation. This paper provides two contributions to this research field. Firstly, a nomenclature is proposed to understand the main initiatives in studying language emergence and grounding, accounting for the variations in assumptions and constraints. Secondly, a PyTorch based deep learning framework is introduced, entitled ReferentialGym, which is dedicated to furthering the exploration of language emergence and grounding. By providing baseline implementations of major algorithms and metrics, in addition to many different features and approaches, ReferentialGym attempts to ease the entry barrier to the field and provide the community with common implementations. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: Accepted at 4th NeurIPS Workshop on Emergent Communication (EmeCom @ NeurIPS 2020)

arXiv:2010.07922 [pdf, other]

Representation Learning via Invariant Causal Mechanisms

Authors: Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell

Abstract: Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representa… ▽ More Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representation learning using a causal framework. We show how data augmentations can be more effectively utilized through explicit invariance constraints on the proxy classifiers employed during pretraining. Based on this, we propose a novel self-supervised objective, Representation Learning via Invariant Causal Mechanisms (ReLIC), that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees. Further, using causality we generalize contrastive learning, a particular kind of self-supervised method, and provide an alternative theoretical explanation for the success of these methods. Empirically, ReLIC significantly outperforms competing methods in terms of robustness and out-of-distribution generalization on ImageNet, while also significantly outperforming these methods on Atari achieving above human-level performance on $51$ out of $57$ games. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2009.03304 [pdf]

Conquery: an open source application to analyze high content healthcare data

Authors: Fabian Kovacs, Max Thonagel, Marion Ludwig, Alexander Albrecht, Manuel Hegner, Dirk Enders, Lennart Hickstein, Maximilian von Knobloch, Anne Rothhardt, Jochen Walker

Abstract: Introduction: Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. Objectives: The presented software solution Conquery is an open-source softwa… ▽ More Introduction: Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. Objectives: The presented software solution Conquery is an open-source software tool providing advanced, but intuitive data analysis without the need for specialized statistical training. Conquery aims to simplify big data analysis for novice database users in the medical sector. Methods: Conquery is a document-oriented distributed timeseries database and analysis platform. Its main application is the analysis of per-person medical records by non-technical medical professionals. Complex analyses are realized in the Conquery frontend by dragging tree nodes into the query editor. Queries are evaluated by a bespoke distributed query-engine for medical records in a column-oriented fashion. We present a custom compression scheme to facilitate low response times that uses online calculated as well as precomputed metadata and data statistics. Results: Conquery allows for easy navigation through the hierarchy and enables complex study cohort construction whilst reducing the demand on time and resources. The UI of Conquery and a query output is exemplified by the construction of a relevant clinical cohort. Conclusions: Conquery is an efficient and intuitive open-source software for performant and secure data analysis and aims at supporting decision-making processes in the healthcare sector. △ Less

Submitted 7 February, 2022; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: 7 pages, 5 figures, 3 supplementary codes

MSC Class: H.2.4

arXiv:2008.11749 [pdf, other]

A Geometric Analysis Of The Harmonic Structure of "In My Life"

Authors: James S. Walker, Gary W. Don

Abstract: After our book---Mathematics and Music: Composition, Perception, and Performance, 2nd edition, CRC Press, 2020---was published, we found a striking example of the importance of the Tonnetz for analyzing the harmonic structure of The Beatles' song, "In My Life." Our Tonnetz analysis will illustrate the highly structured geometric logic underlying the numerous chord progressions in the song. Spectro… ▽ More After our book---Mathematics and Music: Composition, Perception, and Performance, 2nd edition, CRC Press, 2020---was published, we found a striking example of the importance of the Tonnetz for analyzing the harmonic structure of The Beatles' song, "In My Life." Our Tonnetz analysis will illustrate the highly structured geometric logic underlying the numerous chord progressions in the song. Spectrograms provide a way for us to visualize chordal harmonics and their connection with voice leading. We shall also describe the interesting harmonic rhythms of the song's chord progressions. A lot of this harmonic rhythm lends itself well to a geometric description. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: This preprint is a supplement to our book, Mathematics and Music: Composition, Perception, and Performance, 2nd edition, CRC Press, 2020

arXiv:2006.04471 [pdf, ps, other]

A Comparison of Self-Play Algorithms Under a Generalized Framework

Authors: Daniel Hernandez, Kevin Denamganai, Sam Devlin, Spyridon Samothrakis, James Alfred Walker

Abstract: Throughout scientific history, overarching theoretical frameworks have allowed researchers to grow beyond personal intuitions and culturally biased theories. They allow to verify and replicate existing findings, and to link is connected results. The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model. We present a formalized frame… ▽ More Throughout scientific history, overarching theoretical frameworks have allowed researchers to grow beyond personal intuitions and culturally biased theories. They allow to verify and replicate existing findings, and to link is connected results. The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model. We present a formalized framework, with clearly defined assumptions, which encapsulates the meaning of self-play as abstracted from various existing self-play algorithms. This framework is framed as an approximation to a theoretical solution concept for multiagent training. On a simple environment, we qualitatively measure how well a subset of the captured self-play methods approximate this solution when paired with the famous PPO algorithm. We also provide insights on interpreting quantitative metrics of performance for self-play training. Our results indicate that, throughout training, various self-play definitions exhibit cyclic policy evolutions. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2006.04419 [pdf, other]

Metagame Autobalancing for Competitive Multiplayer Games

Authors: Daniel Hernandez, Charles Takashi Toyin Gbadamosi, James Goodman, James Alfred Walker

Abstract: Automated game balancing has often focused on single-agent scenarios. In this paper we present a tool for balancing multi-player games during game design. Our approach requires a designer to construct an intuitive graphical representation of their meta-game target, representing the relative scores that high-level strategies (or decks, or character types) should experience. This permits more sophis… ▽ More Automated game balancing has often focused on single-agent scenarios. In this paper we present a tool for balancing multi-player games during game design. Our approach requires a designer to construct an intuitive graphical representation of their meta-game target, representing the relative scores that high-level strategies (or decks, or character types) should experience. This permits more sophisticated balance targets to be defined beyond a simple requirement of equal win chances. We then find a parameterization of the game that meets this target using simulation-based optimization to minimize the distance to the target graph. We show the capabilities of this tool on examples inheriting from Rock-Paper-Scissors, and on a more complex asymmetric fighting game. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2003.03466 [pdf, other]

Deep learning for prediction of population health costs

Authors: Philipp Drewe-Boss, Dirk Enders, Jochen Walker, Uwe Ohler

Abstract: Accurate prediction of healthcare costs is important for optimally managing health costs. However, methods leveraging the medical richness from data such as health insurance claims or electronic health records are missing. Here, we developed a deep neural network to predict future cost from health insurance claims records. We applied the deep network and a ridge regression model to a sample of 1.4… ▽ More Accurate prediction of healthcare costs is important for optimally managing health costs. However, methods leveraging the medical richness from data such as health insurance claims or electronic health records are missing. Here, we developed a deep neural network to predict future cost from health insurance claims records. We applied the deep network and a ridge regression model to a sample of 1.4 million German insurants to predict total one-year health care costs. Both methods were compared to Morbi-RSA models with various performance measures and were also used to predict patients with a change in costs and to identify relevant codes for this prediction. We showed that the neural network outperformed the ridge regression as well as all Morbi-RSA models for cost prediction. Further, the neural network was superior to ridge regression in predicting patients with cost change and identified more specific codes. In summary, we showed that our deep neural network can leverage the full complexity of the patient records and outperforms standard approaches. We suggest that the better performance is due to the ability to incorporate complex interactions in the model and that the model might also be used for predicting other health phenotypes. △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:2002.12228 [pdf, other]

Exploiting Colorimetry for Fidelity in Data Visualization

Authors: M. J. Waters, J. M. Walker, C. T. Nelson, D. Joester, J. M. Rondinelli

Abstract: Advances in multimodal characterization methods fuel a generation of increasing immense hyper-dimensional datasets. Color map** is employed for conveying higher dimensional data in two-dimensional (2D) representations for human consumption without relying on multiple projections. How one constructs these color maps, however, critically affects how accurately one perceives data. For simple scalar… ▽ More Advances in multimodal characterization methods fuel a generation of increasing immense hyper-dimensional datasets. Color map** is employed for conveying higher dimensional data in two-dimensional (2D) representations for human consumption without relying on multiple projections. How one constructs these color maps, however, critically affects how accurately one perceives data. For simple scalar fields, perceptually uniform color maps and color selection have been shown to improve data readability and interpretation across research fields. Here we review core concepts underlying the design of perceptually uniform color map and extend the concepts from scalar fields to two-dimensional vector fields and three-component composition fields frequently found in materials-chemistry research to enable high-fidelity visualization. We develop the software tools PAPUC and CMPUC to enable researchers to utilize these colorimetry principles and employ perceptually uniform color spaces for rigorously meaningful color map** of higher dimensional data representations. Last, we demonstrate how these approaches deliver immediate improvements in data readability and interpretation in microscopies and spectroscopies routinely used in discerning materials structure, chemistry, and properties. △ Less

Submitted 27 February, 2020; originally announced February 2020.

arXiv:1910.13000 [pdf, other]

Human-centered Control of a Growing Soft Robot for Object Manipulation

Authors: Fabio Stroppa, Ming Luo, Giada Gerboni, Margaret M. Coad, Julie M. Walker, Allison M. Okamura

Abstract: We present a user-friendly interface to teleoperate a soft robot manipulator in a complex environment. Key components of the system include a manipulator with a gras** end-effector that grows via tip eversion, gesture-based control, and haptic display to the operator for feedback and guidance. In the initial work, the operator uses the soft robot to build a tower of blocks, and future works will… ▽ More We present a user-friendly interface to teleoperate a soft robot manipulator in a complex environment. Key components of the system include a manipulator with a gras** end-effector that grows via tip eversion, gesture-based control, and haptic display to the operator for feedback and guidance. In the initial work, the operator uses the soft robot to build a tower of blocks, and future works will extend this to shared autonomy scenarios in which the human operator and robot intelligence are both necessary for task completion. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1906.03939 [pdf, other]

Time to Die: Death Prediction in Dota 2 using Deep Learning

Authors: Adam Katona, Ryan Spick, Victoria Hodge, Simon Demediuk, Florian Block, Anders Drachen, James Alfred Walker

Abstract: Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports… ▽ More Esports have become major international sports with hundreds of millions of spectators. Esports games generate massive amounts of telemetry data. Using these to predict the outcome of esports matches has received considerable attention, but micro-predictions, which seek to predict events inside a match, is as yet unknown territory. Micro-predictions are however of perennial interest across esports commentators and audience, because they provide the ability to observe events that might otherwise be missed: esports games are highly complex with fast-moving action where the balance of a game can change in the span of seconds, and where events can happen in multiple areas of the playing field at the same time. Such events can happen rapidly, and it is easy for commentators and viewers alike to miss an event and only observe the following impact of events. In Dota 2, a player hero being killed by the opposing team is a key event of interest to commentators and audience. We present a deep learning network with shared weights which provides accurate death predictions within a five-second window. The network is trained on a vast selection of Dota 2 gameplay features and professional/semi-professional level match dataset. Even though death events are rare within a game (1\% of the data), the model achieves 0.377 precision with 0.725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds. An example of the system applied to a Dota 2 match is presented. This model enables real-time micro-predictions of kills in Dota 2, one of the most played esports titles in the world, giving commentators and viewers time to move their attention to these key events. △ Less

Submitted 21 May, 2019; originally announced June 2019.

arXiv:1905.13694 [pdf, other]

Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams

Authors: Charles Ringer, James Alfred Walker, Mihalis A. Nicolaou

Abstract: Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer's emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of lives… ▽ More Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer's emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of livestreams, but comes with a significant set of challenges - such as fusing multimodal data captured by different sensors in uncontrolled ('in-the-wild') conditions. Firstly, we present, to our knowledge, the first data set of League of Legends livestreams, annotated for both streamer affect and game context. Secondly, we propose a method that exploits tensor decompositions for high-order fusion of multimodal representations. The proposed method is evaluated on the problem of jointly predicting game context and player affect, compared with a set of baseline fusion approaches such as late and early fusion. △ Less

Submitted 31 May, 2019; originally announced May 2019.

Comments: 8 Pages, IEEE Conference on Games 2019

arXiv:1904.00510 [pdf]

doi 10.31256/HSMR2019.9

How to enhance learning of robotic surgery gestures? A tactile cue saliency investigation for 3D hand guidance

Authors: Gustavo D. Gil, Julie M. Walker, Nabil Zemiti, Allison M. Okamura, Philippe Poignet

Abstract: The current generation of surgeons requires extensive training in teleoperation to develop specific dexterous skills, which are independent of medical knowledge. Training curricula progress from manipulation tasks to simulated surgical tasks but are limited in time. To tackle this, we propose to integrate surgical robotic training together with Haptic Feedback (HF) to improve skill acquisition. Th… ▽ More The current generation of surgeons requires extensive training in teleoperation to develop specific dexterous skills, which are independent of medical knowledge. Training curricula progress from manipulation tasks to simulated surgical tasks but are limited in time. To tackle this, we propose to integrate surgical robotic training together with Haptic Feedback (HF) to improve skill acquisition. This paper present the initial but promising results of our haptic device designed to support in the training of surgical gestures. Our ongoing work is related to integrate the HF in the RAVEN II platform. △ Less

Submitted 19 July, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: HSMR: 12th Hamlyn Symposium on Medical Robotics (London, 24th-26th June 2019)

arXiv:1903.06889 [pdf, other]

MultiK: A Framework for Orchestrating Multiple Specialized Kernels

Authors: Hsuan-Chi Kuo, Akshith Gunasekaran, Yeong** Jang, Sibin Mohan, Rakesh B. Bobba, David Lie, Jesse Walker

Abstract: We present, MultiK, a Linux-based framework 1 that reduces the attack surface for operating system kernels by reducing code bloat. MultiK "orchestrates" multiple kernels that are specialized for individual applications in a transparent manner. This framework is flexible to accommodate different kernel code reduction techniques and, most importantly, run the specialized kernels with near-zero addit… ▽ More We present, MultiK, a Linux-based framework 1 that reduces the attack surface for operating system kernels by reducing code bloat. MultiK "orchestrates" multiple kernels that are specialized for individual applications in a transparent manner. This framework is flexible to accommodate different kernel code reduction techniques and, most importantly, run the specialized kernels with near-zero additional runtime overheads. MultiK avoids the overheads of virtualization and runs natively on the system. For instance, an Apache instance is shown to run on a kernel that has (a) 93.68% of its code reduced, (b) 19 of 23 known kernel vulnerabilities eliminated and (c) with negligible performance overheads (0.19%). MultiK is a framework that can integrate with existing code reduction and OS security techniques. We demonstrate this by using D-KUT and S-KUT -- two methods to profile and eliminate unwanted kernel code. The whole process is transparent to the user applications because MultiK does not require a recompilation of the application. △ Less

Submitted 16 March, 2019; originally announced March 2019.

arXiv:1903.03150 [pdf, other]

Holdable Haptic Device for 4-DOF Motion Guidance

Authors: Julie M. Walker, Nabil Zemiti, Philippe Poignet, Allison M. Okamura

Abstract: Hand-held haptic devices can allow for greater freedom of motion and larger workspaces than traditional grounded haptic devices. They can also provide more compelling haptic sensations to the users' fingertips than many wearable haptic devices because reaction forces can be distributed over a larger area of skin far away from the stimulation site. This paper presents a hand-held kinesthetic grippe… ▽ More Hand-held haptic devices can allow for greater freedom of motion and larger workspaces than traditional grounded haptic devices. They can also provide more compelling haptic sensations to the users' fingertips than many wearable haptic devices because reaction forces can be distributed over a larger area of skin far away from the stimulation site. This paper presents a hand-held kinesthetic gripper that provides guidance cues in four degrees of freedom (DOF). 2-DOF tangential forces on the thumb and index finger combine to create cues to translate or rotate the hand. We demonstrate the device's capabilities in a three-part user study. First, users moved their hands in response to haptic cues before receiving instruction or training. Then, they trained on cues in eight directions in a forced-choice task. Finally, they repeated the first part, now knowing what each cue intended to convey. Users were able to discriminate each cue over 90% of the time. Users moved correctly in response to the guidance cues both before and after the training and indicated that the cues were easy to follow. The results show promise for holdable kinesthetic devices in haptic feedback and guidance for applications such as virtual reality, medical training, and teleoperation. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: Submitted to IEEE World Haptics Conference 2019

arXiv:1901.11129 [pdf, other]

Generic Connectivity-Based CGRA Map** via Integer Linear Programming

Authors: Matthew J. P. Walker, Jason H. Anderson

Abstract: Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD map** tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large cons… ▽ More Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD map** tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architecture's flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hit a time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed map** techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework. △ Less

Submitted 30 April, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: 8 pages of content; 8 figures; 3 tables; to appear in FCCM 2019; Uses the CGRA-ME framework at http://cgra-me.ece.utoronto.ca/

arXiv:1708.01658 [pdf, ps, other]

Exploring Features for Predicting Policy Citations

Authors: Christian Bailey, Bharat Kale, Jamieson Walker, Harish Varma Siravuri, Hamed Alhoori, Micheal E. Papka

Abstract: In this study we performed an initial investigation and evaluation of altmetrics and their relationship with public policy citation of research papers. We examined methods for using altmetrics and other data to predict whether a research paper is cited in public policy and applied receiver operating characteristic curve on various feature groups in order to evaluate their potential usefulness. Fro… ▽ More In this study we performed an initial investigation and evaluation of altmetrics and their relationship with public policy citation of research papers. We examined methods for using altmetrics and other data to predict whether a research paper is cited in public policy and applied receiver operating characteristic curve on various feature groups in order to evaluate their potential usefulness. From the methods we tested, classifying based on tweet count provided the best results, achieving an area under the ROC curve of 0.91. △ Less

Submitted 15 June, 2017; originally announced August 2017.

Comments: 2 pages, accepted to JCDL '17

arXiv:1705.00053 [pdf, other]

The Pose Knows: Video Forecasting by Generating Pose Futures

Authors: Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert

Abstract: Current approaches in video forecasting attempt to generate videos directly in pixel space using Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). However, since these approaches try to model all the structure and scene dynamics at once, in unconstrained settings they often generate uninterpretable results. Our insight is to model the forecasting problem at a higher level… ▽ More Current approaches in video forecasting attempt to generate videos directly in pixel space using Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). However, since these approaches try to model all the structure and scene dynamics at once, in unconstrained settings they often generate uninterpretable results. Our insight is to model the forecasting problem at a higher level of abstraction. Specifically, we exploit human pose detectors as a free source of supervision and break the video forecasting problem into two discrete steps. First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space. We then use the future poses generated as conditional information to a GAN to predict the future frames of the video in pixel space. By using the structured space of pose as an intermediate representation, we sidestep the problems that GANs have in generating video pixels directly. We show through quantitative and qualitative evaluation that our method outperforms state-of-the-art methods for video prediction. △ Less

Submitted 28 April, 2017; originally announced May 2017.

Comments: Project Website: http://www.cs.cmu.edu/~jcwalker/POS/POS.html

arXiv:1701.05944 [pdf, other]

doi 10.1007/978-3-319-63931-4_1

Representations of the Multicast Network Problem

Authors: Sarah E. Anderson, Wael Halbawi, Nathan Kaplan, Hiram H. López, Felice Manganiello, Emina Soljanin, Judy Walker

Abstract: We approach the problem of linear network coding for multicast networks from different perspectives. We introduce the notion of the coding points of a network, which are edges of the network where messages combine and coding occurs. We give an integer linear program that leads to choices of paths through the network that minimize the number of coding points. We introduce the code graph of a networ… ▽ More We approach the problem of linear network coding for multicast networks from different perspectives. We introduce the notion of the coding points of a network, which are edges of the network where messages combine and coding occurs. We give an integer linear program that leads to choices of paths through the network that minimize the number of coding points. We introduce the code graph of a network, a simplified directed graph that maintains the information essential to understanding the coding properties of the network. One of the main problems in network coding is to understand when the capacity of a multicast network is achieved with linear network coding over a finite field of size q. We explain how this problem can be interpreted in terms of rational points on certain algebraic varieties. △ Less

Submitted 20 January, 2017; originally announced January 2017.

Comments: 24 pages, 19 figures

Journal ref: 2016 Conference on Algebraic Geometry for Coding Theory and Cryptography, Association for Women in Mathematics Series, 9, 1-23 (2017). Springer, Cham

Showing 1–50 of 61 results for author: Walker, J