-
Double-Anonymous Review for Robotics
Authors:
Justin K. Yim,
Paul Nadan,
James Zhu,
Alexandra Stutt,
J. Joe Payne,
Catherine Pavlov,
Aaron M. Johnson
Abstract:
Prior research has investigated the benefits and costs of double-anonymous review (DAR, also known as double-blind review) in comparison to single-anonymous review (SAR) and open review (OR). Several review papers have attempted to compile experimental results in peer review research both broadly and in engineering and computer science. This document summarizes prior research in peer review that m…
▽ More
Prior research has investigated the benefits and costs of double-anonymous review (DAR, also known as double-blind review) in comparison to single-anonymous review (SAR) and open review (OR). Several review papers have attempted to compile experimental results in peer review research both broadly and in engineering and computer science. This document summarizes prior research in peer review that may inform decisions about the format of peer review in the field of robotics and makes some recommendations for potential next steps for robotics publication.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI
Authors:
Eddy Solomon,
Patricia M. Johnson,
Zhengguo Tan,
Radhika Tibrewala,
Yvonne W. Lui,
Florian Knoll,
Linda Moy,
Sungheon Gene Kim,
Laura Heacock
Abstract:
This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will…
▽ More
This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will support research and development of fast and quantitative breast image reconstruction and machine learning methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
The fast committor machine: Interpretable prediction with kernels
Authors:
D. Aristoff,
M. Johnson,
G. Simpson,
R. J. Webber
Abstract:
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration $x$ will reach a set $B$ before a set $A$. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the comm…
▽ More
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration $x$ will reach a set $B$ before a set $A$. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces which optimally describe the $A$ to $B$ transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly in the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.
△ Less
Submitted 10 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Computing Balanced Solutions for Large International Kidney Exchange Schemes When Cycle Length Is Unbounded
Authors:
Márton Benedek,
Péter Biró,
Gergely Csáji,
Matthew Johnson,
Daniël Paulusma,
Xin Ye
Abstract:
In kidney exchange programmes (KEP) patients may swap their incompatible donors leading to cycles of kidney transplants. Nowadays, countries try to merge their national patient-donor pools leading to international KEPs (IKEPs). As shown in the literature, long-term stability of an IKEP can be achieved through a credit-based system. In each round, every country is prescribed a "fair" initial alloca…
▽ More
In kidney exchange programmes (KEP) patients may swap their incompatible donors leading to cycles of kidney transplants. Nowadays, countries try to merge their national patient-donor pools leading to international KEPs (IKEPs). As shown in the literature, long-term stability of an IKEP can be achieved through a credit-based system. In each round, every country is prescribed a "fair" initial allocation of kidney transplants. The initial allocation, which we obtain by using solution concepts from cooperative game theory, is adjusted by incorporating credits from the previous round, yielding the target allocation. The goal is to find, in each round, an optimal solution that closely approximates this target allocation. There is a known polynomial-time algorithm for finding an optimal solution that lexicographically minimizes the country deviations from the target allocation if only $2$-cycles (matchings) are permitted. In practice, kidney swaps along longer cycles may be performed. However, the problem of computing optimal solutions for maximum cycle length $\ell$ is NP-hard for every $\ell\geq 3$. This situation changes back to polynomial time once we allow unbounded cycle length. However, in contrast to the case where $\ell=2$, we show that for $\ell=\infty$, lexicographical minimization is only polynomial-time solvable under additional conditions (assuming P $\neq$ NP). Nevertheless, the fact that the optimal solutions themselves can be computed in polynomial time if $\ell=\infty$ still enables us to perform a large scale experimental study for showing how stability and total social welfare are affected when we set $\ell=\infty$ instead of $\ell=2$.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Fine-Grained Analysis of Team Collaborative Dialogue
Authors:
Ian Perera,
Matthew Johnson,
Carson Wilber
Abstract:
Natural language analysis of human collaborative chat dialogues is an understudied domain with many unique challenges: a large number of dialogue act labels, underspecified and dynamic tasks, interleaved topics, and long-range contextual dependence. While prior work has studied broad metrics of team dialogue and associated performance using methods such as LSA, there has been little effort in gene…
▽ More
Natural language analysis of human collaborative chat dialogues is an understudied domain with many unique challenges: a large number of dialogue act labels, underspecified and dynamic tasks, interleaved topics, and long-range contextual dependence. While prior work has studied broad metrics of team dialogue and associated performance using methods such as LSA, there has been little effort in generating fine-grained descriptions of team dynamics and individual performance from dialogue. We describe initial work towards develo** an explainable analytics tool in the software development domain using Slack chats mined from our organization, including generation of a novel, hierarchical labeling scheme; design of descriptive metrics based on the frequency of occurrence of dialogue acts; and initial results using a transformer + CRF architecture to incorporate long-range context.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Modelling wildland fire burn severity in California using a spatial Super Learner approach
Authors:
Nicholas Simafranca,
Bryant Willoughby,
Erin O'Neil,
Sophie Farr,
Brian J Reich,
Naomi Giertych,
Margaret Johnson,
Madeleine Pascolini-Campbell
Abstract:
Given the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. We develop a machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. Hydrological, ecological, and topographical variables collected from four regions of California - the sites of the Kincade fire (…
▽ More
Given the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. We develop a machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. Hydrological, ecological, and topographical variables collected from four regions of California - the sites of the Kincade fire (2019), the CZU Lightning Complex fire (2020), the Windy fire (2021), and the KNP Fire (2021) - are used as predictors of the difference normalized burn ratio. We hypothesize that a Super Learner (SL) algorithm that accounts for spatial autocorrelation using Vecchia's Gaussian approximation will accurately model burn severity. In all combinations of test and training sets explored, the results of our model showed the SL algorithm outperformed standard Linear Regression methods. After fitting and verifying the performance of the SL model, we use interpretable machine learning tools to determine the main drivers of severe burn damage, including greenness, elevation and fire weather variables. These findings provide actionable insights that enable communities to strategize interventions, such as early fire detection systems, pre-fire season vegetation clearing activities, and resource allocation during emergency responses. When implemented, this model has the potential to minimize the loss of human life, property, resources, and ecosystems in California.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
Authors:
Shafagh Keyvanian,
Michelle J. Johnson,
Nadia Figueroa
Abstract:
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which…
▽ More
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
PAC-tuning:Fine-tuning Pretrained Language Models with PAC-driven Perturbed Gradient Descent
Authors:
Guangliang Liu,
Zhiyu Xue,
Xitong Zhang,
Kristen Marie Johnson,
Rongrong Wang
Abstract:
Fine-tuning pretrained language models (PLMs) for downstream tasks is a large-scale optimization problem, in which the choice of the training algorithm critically determines how well the trained model can generalize to unseen test data, especially in the context of few-shot learning. To achieve good generalization performance and avoid overfitting, techniques such as data augmentation and pruning…
▽ More
Fine-tuning pretrained language models (PLMs) for downstream tasks is a large-scale optimization problem, in which the choice of the training algorithm critically determines how well the trained model can generalize to unseen test data, especially in the context of few-shot learning. To achieve good generalization performance and avoid overfitting, techniques such as data augmentation and pruning are often applied. However, adding these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, such as the popular Adam optimizer. In this paper, we propose a two-stage fine-tuning method, PAC-tuning, to address this optimization challenge. First, based on PAC-Bayes training, PAC-tuning directly minimizes the PAC-Bayes generalization bound to learn proper parameter distribution. Second, PAC-tuning modifies the gradient by injecting noise with the variance learned in the first stage into the model parameters during training, resulting in a variant of perturbed gradient descent (PGD). In the past, the few-shot scenario posed difficulties for PAC-Bayes training because the PAC-Bayes bound, when applied to large models with limited training data, might not be stringent. Our experimental results across 5 GLUE benchmark tasks demonstrate that PAC-tuning successfully handles the challenges of fine-tuning tasks and outperforms strong baseline methods by a visible margin, further confirming the potential to apply PAC training for any other settings where the Adam optimizer is currently used for training.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving
Authors:
Sean J. Wang,
Honghao Zhu,
Aaron M. Johnson
Abstract:
Autonomous off-road driving is challenging as risky actions taken by the robot may lead to catastrophic damage. As such, develo** controllers in simulation is often desirable as it provides a safer and more economical alternative. However, accurately modeling robot dynamics is difficult due to the complex robot dynamics and terrain interactions in unstructured environments. Domain randomization…
▽ More
Autonomous off-road driving is challenging as risky actions taken by the robot may lead to catastrophic damage. As such, develo** controllers in simulation is often desirable as it provides a safer and more economical alternative. However, accurately modeling robot dynamics is difficult due to the complex robot dynamics and terrain interactions in unstructured environments. Domain randomization addresses this problem by randomizing simulation dynamics parameters, however this approach sacrifices performance for robustness leading to policies that are sub-optimal for any target dynamics. We introduce a novel model-based reinforcement learning approach that aims to balance robustness with adaptability. Our approach trains a System Identification Transformer (SIT) and an Adaptive Dynamics Model (ADM) under a variety of simulated dynamics. The SIT uses attention mechanisms to distill state-transition observations from the target system into a context vector, which provides an abstraction for its target dynamics. Conditioned on this, the ADM probabilistically models the system's dynamics. Online, we use a Risk-Aware Model Predictive Path Integral controller (MPPI) to safely control the robot under its current understanding of the dynamics. We demonstrate in simulation as well as in multiple real-world environments that this approach enables safer behaviors upon initialization and becomes less conservative (i.e. faster) as its understanding of the target system dynamics improves with more observations. In particular, our approach results in an approximately 41% improvement in lap-time over the non-adaptive baseline while remaining safe across different environments.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Feasability of Learning Weighted Automata on a Semiring
Authors:
Laure Daviaud,
Marianne Johnson
Abstract:
Since the seminal work by Angluin, active learning of automata, by membership and equivalence queries, has been extensively studied and several generalisations have been developed to learn various extensions of automata. For weighted automata, restricted cases have been tackled in the literature and in this paper we chart the boundaries of the Angluin approach (using a class of hypothesis automata…
▽ More
Since the seminal work by Angluin, active learning of automata, by membership and equivalence queries, has been extensively studied and several generalisations have been developed to learn various extensions of automata. For weighted automata, restricted cases have been tackled in the literature and in this paper we chart the boundaries of the Angluin approach (using a class of hypothesis automata constructed from membership and equivalence queries) applied to learning weighted automata over a general semiring. We show precisely the theoretical limitations of this approach and classify functions with respect to how guessable they are (corresponding to the existence and abundance of solutions of certain systems of equations). We provide a syntactic description of the boundary condition for a correct hypothesis of the prescribed form to exist. Of course, from an algorithmic standpoint, knowing that (many) solutions exist need not translate into an effective algorithm to find one; we conclude with a discussion of some known conditions (and variants thereof) that suffice to ensure this, illustrating the ideas over several familiar semirings (including the natural numbers) and pose some open questions for future research.
△ Less
Submitted 15 May, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Robotic Defect Inspection with Visual and Tactile Perception for Large-scale Components
Authors:
Arpit Agarwal,
Abhiroop Ajith,
Chengtao Wen,
Veniamin Stryzheus,
Brian Miller,
Matthew Chen,
Micah K. Johnson,
Jose Luis Susa Rincon,
Justinian Rosca,
Wenzhen Yuan
Abstract:
In manufacturing processes, surface inspection is a key requirement for quality assessment and damage localization. Due to this, automated surface anomaly detection has become a promising area of research in various industrial inspection systems. A particular challenge in industries with large-scale components, like aircraft and heavy machinery, is inspecting large parts with very small defect dim…
▽ More
In manufacturing processes, surface inspection is a key requirement for quality assessment and damage localization. Due to this, automated surface anomaly detection has become a promising area of research in various industrial inspection systems. A particular challenge in industries with large-scale components, like aircraft and heavy machinery, is inspecting large parts with very small defect dimensions. Moreover, these parts can be of curved shapes. To address this challenge, we present a 2-stage multi-modal inspection pipeline with visual and tactile sensing. Our approach combines the best of both visual and tactile sensing by identifying and localizing defects using a global view (vision) and using the localized area for tactile scanning for identifying remaining defects. To benchmark our approach, we propose a novel real-world dataset with multiple metallic defect types per image, collected in the production environments on real aerospace manufacturing parts, as well as online robot experiments in two environments. Our approach is able to identify 85% defects using Stage I and identify 100% defects after Stage II. The dataset is publicly available at https://zenodo.org/record/8327713
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Accurate synthesis of Dysarthric Speech for ASR data augmentation
Authors:
Mohammad Soleymanpour,
Michael T. Johnson,
Rahim Soleymanpour,
Jeffrey Berry
Abstract:
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talke…
▽ More
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthric-ness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysartrhic-ness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthria
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
The Simplest Walking Robot: A bipedal robot with one actuator and two rigid bodies
Authors:
James Kyle,
Justin K. Yim,
Kendall Hart,
Sarah Bergbreiter,
Aaron M. Johnson
Abstract:
We present the design and experimental results of the first 1-DOF, hip-actuated bipedal robot. While passive dynamic walking is simple by nature, many existing bipeds inspired by this form of walking are complex in control, mechanical design, or both. Our design using only two rigid bodies connected by a single motor aims to enable exploration of walking at smaller sizes where more complex designs…
▽ More
We present the design and experimental results of the first 1-DOF, hip-actuated bipedal robot. While passive dynamic walking is simple by nature, many existing bipeds inspired by this form of walking are complex in control, mechanical design, or both. Our design using only two rigid bodies connected by a single motor aims to enable exploration of walking at smaller sizes where more complex designs cannot be constructed. The walker, "Mugatu", is self-contained and autonomous, open-loop stable over a range of input parameters, able to stop and start from standing, and able to control its heading left and right. We analyze the mechanical design and distill down a set of design rules that enable these behaviors. Experimental evaluations measure speed, energy consumption, and steering.
△ Less
Submitted 30 October, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Collision Detection for Multi-Robot Motion Planning with Efficient Quad-Tree Update and Skip**
Authors:
Abdel Zaro,
Ardalan Tajbakhsh,
Aaron M. Johnson
Abstract:
This paper presents a novel and efficient collision checking approach called Updating and Collision Check Skip** Quad-tree (USQ) for multi-robot motion planning. USQ extends the standard quad-tree data structure through a time-efficient update mechanism, which significantly reduces the total number of collision checks and the collision checking time. In addition, it handles transitions at the qu…
▽ More
This paper presents a novel and efficient collision checking approach called Updating and Collision Check Skip** Quad-tree (USQ) for multi-robot motion planning. USQ extends the standard quad-tree data structure through a time-efficient update mechanism, which significantly reduces the total number of collision checks and the collision checking time. In addition, it handles transitions at the quad-tree quadrant boundaries based on worst-case trajectories of agents. These extensions make quad-trees suitable for efficient collision checking in multi-robot motion planning of large robot teams. We evaluate the efficiency of USQ in comparison with Regenerating Quad-tree (RQ) from scratch at each timestep and naive pairwise collision checking across a variety of randomized environments. The results indicate that USQ significantly reduces the number of collision checks and the collision checking time compared to other baselines for different numbers of robots and map sizes. In a 50-robot experiment, USQ accurately detected all collisions, outperforming RQ which has longer run-times and/or misses up to 25% of collisions.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Saltation Matrices: The Essential Tool for Linearizing Hybrid Dynamical Systems
Authors:
Nathan J. Kong,
J. Joe Payne,
James Zhu,
Aaron M. Johnson
Abstract:
Hybrid dynamical systems, i.e. systems that have both continuous and discrete states, are ubiquitous in engineering, but are difficult to work with due to their discontinuous transitions. For example, a robot leg is able to exert very little control effort while it is in the air compared to when it is on the ground. When the leg hits the ground, the penetrating velocity instantaneously collapses t…
▽ More
Hybrid dynamical systems, i.e. systems that have both continuous and discrete states, are ubiquitous in engineering, but are difficult to work with due to their discontinuous transitions. For example, a robot leg is able to exert very little control effort while it is in the air compared to when it is on the ground. When the leg hits the ground, the penetrating velocity instantaneously collapses to zero. These instantaneous changes in dynamics and discontinuities (or jumps) in state make standard smooth tools for planning, estimation, control, and learning difficult for hybrid systems. One of the key tools for accounting for these jumps is called the saltation matrix. The saltation matrix is the sensitivity update when a hybrid jump occurs and has been used in a variety of fields including robotics, power circuits, and computational neuroscience. This paper presents an intuitive derivation of the saltation matrix and discusses what it captures, where it has been used in the past, how it is used for linear and quadratic forms, how it is computed for rigid body systems with unilateral constraints, and some of the structural properties of the saltation matrix in these cases.
△ Less
Submitted 20 June, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Sources of Hallucination by Large Language Models on Inference Tasks
Authors:
Nick McKenna,
Tianyi Li,
Liang Cheng,
Mohammad Javad Hosseini,
Mark Johnson,
Mark Steedman
Abstract:
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavi…
▽ More
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level of sentences: we show that, regardless of the premise, models falsely label NLI test samples as entailing when the hypothesis is attested in training data, and that entities are used as ``indices'' to access the memorized data. Second, statistical patterns of usage learned at the level of corpora: we further show a similar effect when the premise predicate is less frequent than that of the hypothesis in the training data, a bias following from previous studies. We demonstrate that LLMs perform significantly worse on NLI test samples which do not conform to these biases than those which do, and we offer these as valuable controls for future LLM evaluation.
△ Less
Submitted 22 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Authors:
Sebastian Ruder,
Jonathan H. Clark,
Alexander Gutkin,
Mihir Kale,
Min Ma,
Massimo Nicosia,
Shruti Rijhwani,
Parker Riley,
Jean-Michel A. Sarr,
Xinyi Wang,
John Wieting,
Nitish Gupta,
Anna Katanova,
Christo Kirov,
Dana L. Dickinson,
Brian Roark,
Bidisha Samanta,
Connie Tao,
David I. Adelani,
Vera Axelrod,
Isaac Caswell,
Colin Cherry,
Dan Garrette,
Reeve Ingle,
Melvin Johnson
, et al. (2 additional authors not shown)
Abstract:
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;…
▽ More
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models
△ Less
Submitted 24 May, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
PaLM 2 Technical Report
Authors:
Rohan Anil,
Andrew M. Dai,
Orhan Firat,
Melvin Johnson,
Dmitry Lepikhin,
Alexandre Passos,
Siamak Shakeri,
Emanuel Taropa,
Paige Bailey,
Zhifeng Chen,
Eric Chu,
Jonathan H. Clark,
Laurent El Shafey,
Yan** Huang,
Kathy Meier-Hellstern,
Gaurav Mishra,
Erica Moreira,
Mark Omernick,
Kevin Robinson,
Sebastian Ruder,
Yi Tay,
Kefan Xiao,
Yuanzhong Xu,
Yu**g Zhang,
Gustavo Hernandez Abrego
, et al. (103 additional authors not shown)
Abstract:
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on…
▽ More
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.
△ Less
Submitted 13 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Complexity Framework for Forbidden Subgraphs IV: The Steiner Forest Problem
Authors:
Hans L. Bodlaender,
Matthew Johnson,
Barnaby Martin,
Jelle J. Oostveen,
Sukanya Pandey,
Daniel Paulusma,
Siani Smith,
Erik Jan van Leeuwen
Abstract:
We study Steiner Forest on $H$-subgraph-free graphs, that is, graphs that do not contain some fixed graph $H$ as a (not necessarily induced) subgraph. We are motivated by a recent framework that completely characterizes the complexity of many problems on $H$-subgraph-free graphs. However, in contrast to e.g. the related Steiner Tree problem, Steiner Forest falls outside this framework. Hence, the…
▽ More
We study Steiner Forest on $H$-subgraph-free graphs, that is, graphs that do not contain some fixed graph $H$ as a (not necessarily induced) subgraph. We are motivated by a recent framework that completely characterizes the complexity of many problems on $H$-subgraph-free graphs. However, in contrast to e.g. the related Steiner Tree problem, Steiner Forest falls outside this framework. Hence, the complexity of Steiner Forest on $H$-subgraph-free graphs remained tantalizingly open. In this paper, we make significant progress towards determining the complexity of Steiner Forest on $H$-subgraph-free graphs. Our main results are four novel polynomial-time algorithms for different excluded graphs $H$ that are central to further understand its complexity. Along the way, we study the complexity of Steiner Forest for graphs with a small $c$-deletion set, that is, a small set $S$ of vertices such that each component of $G-S$ has size at most $c$. Using this parameter, we give two noteworthy algorithms that we later employ as subroutines. First, we prove Steiner Forest is FPT parameterized by $|S|$ when $c=1$ (i.e. the vertex cover number). Second, we prove Steiner Forest is polynomial-time solvable for graphs with a 2-deletion set of size at most 2. The latter result is tight, as the problem is NP-complete for graphs with a 3-deletion set of size 2.
△ Less
Submitted 15 October, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Complexity Framework for Forbidden Subgraphs III: When Problems are Tractable on Subcubic Graphs
Authors:
Matthew Johnson,
Barnaby Martin,
Sukanya Pandey,
Daniël Paulusma,
Siani Smith,
Erik Jan van Leeuwen
Abstract:
For any finite set $\mathcal{H} = \{H_1,\ldots,H_p\}$ of graphs, a graph is $\mathcal{H}$-subgraph-free if it does not contain any of $H_1,\ldots,H_p$ as a subgraph. In recent work, meta-classifications have been studied: these show that if graph problems satisfy certain prescribed conditions, their complexity is determined on classes of $\mathcal{H}$-subgraph-free graphs. We continue this work an…
▽ More
For any finite set $\mathcal{H} = \{H_1,\ldots,H_p\}$ of graphs, a graph is $\mathcal{H}$-subgraph-free if it does not contain any of $H_1,\ldots,H_p$ as a subgraph. In recent work, meta-classifications have been studied: these show that if graph problems satisfy certain prescribed conditions, their complexity is determined on classes of $\mathcal{H}$-subgraph-free graphs. We continue this work and focus on problems that have polynomial-time solutions on classes that have bounded treewidth or maximum degree at most~$3$ and examine their complexity on $H$-subgraph-free graph classes where $H$ is a connected graph. With this approach, we obtain comprehensive classifications for (Independent) Feedback Vertex Set, Connected Vertex Cover, Colouring and Matching Cut. This resolves a number of open problems.
We highlight that, to establish that Independent Feedback Vertex Set belongs to this collection of problems, we first show that it can be solved in polynomial time on graphs of maximum degree $3$. We demonstrate that, with the exception of the complete graph on four vertices, each graph in this class has a minimum size feedback vertex set that is also an independent set.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
FastMRI Prostate: A Publicly Available, Biparametric MRI Dataset to Advance Machine Learning for Prostate Cancer Imaging
Authors:
Radhika Tibrewala,
Tarun Dutt,
Angela Tong,
Luke Ginocchio,
Mahesh B Keerthivasan,
Steven H Baete,
Sumit Chopra,
Yvonne W Lui,
Daniel K Sodickson,
Hersh Chandarana,
Patricia M Johnson
Abstract:
The fastMRI brain and knee dataset has enabled significant advances in exploring reconstruction methods for improving speed and image quality for Magnetic Resonance Imaging (MRI) via novel, clinically relevant reconstruction approaches. In this study, we describe the April 2023 expansion of the fastMRI dataset to include biparametric prostate MRI data acquired on a clinical population. The dataset…
▽ More
The fastMRI brain and knee dataset has enabled significant advances in exploring reconstruction methods for improving speed and image quality for Magnetic Resonance Imaging (MRI) via novel, clinically relevant reconstruction approaches. In this study, we describe the April 2023 expansion of the fastMRI dataset to include biparametric prostate MRI data acquired on a clinical population. The dataset consists of raw k-space and reconstructed images for T2-weighted and diffusion-weighted sequences along with slice-level labels that indicate the presence and grade of prostate cancer. As has been the case with fastMRI, increasing accessibility to raw prostate MRI data will further facilitate research in MR image reconstruction and evaluation with the larger goal of improving the utility of MRI for prostate cancer detection and evaluation. The dataset is available at https://fastmri.med.nyu.edu.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Staged Contact Optimization: Combining Contact-Implicit and Multi-Phase Hybrid Trajectory Optimization
Authors:
Michael R. Turski,
Joseph Norby,
Aaron M. Johnson
Abstract:
Trajectory optimization problems for legged robots are commonly formulated with fixed contact schedules. These multi-phase Hybrid Trajectory Optimization (HTO) methods result in locally optimal trajectories, but the result depends heavily upon the predefined contact mode sequence. Contact-Implicit Optimization (CIO) offers a potential solution to this issue by allowing the contact mode to be deter…
▽ More
Trajectory optimization problems for legged robots are commonly formulated with fixed contact schedules. These multi-phase Hybrid Trajectory Optimization (HTO) methods result in locally optimal trajectories, but the result depends heavily upon the predefined contact mode sequence. Contact-Implicit Optimization (CIO) offers a potential solution to this issue by allowing the contact mode to be determined throughout the trajectory by the optimization solver. However, CIO suffers from long solve times and convergence issues. This work combines the benefits of these two methods into one algorithm: Staged Contact Optimization (SCO). SCO tightens constraints on contact in stages, eventually fixing them to allow robust and fast convergence to a feasible solution. Results on a planar biped and spatial quadruped demonstrate speed and optimality improvements over CIO and HTO. These properties make SCO well suited for offline trajectory generation or as an effective tool for exploring the dynamic capabilities of a robot.
△ Less
Submitted 17 September, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Proprioception and reaction for walking among entanglements
Authors:
Justin K. Yim,
Jiming Ren,
David Ologan,
Selvin Garcia Gonzalez,
Aaron M. Johnson
Abstract:
Entanglements like vines and branches in natural settings or cords and pipes in human spaces prevent mobile robots from accessing many environments. Legged robots should be effective in these settings, and more so than wheeled or tracked platforms, but naive controllers quickly become entangled and stuck. In this paper we present a method for proprioception aimed specifically at the task of sensin…
▽ More
Entanglements like vines and branches in natural settings or cords and pipes in human spaces prevent mobile robots from accessing many environments. Legged robots should be effective in these settings, and more so than wheeled or tracked platforms, but naive controllers quickly become entangled and stuck. In this paper we present a method for proprioception aimed specifically at the task of sensing entanglements of a robot's legs as well as a reaction strategy to disentangle legs during their swing phase as they advance to their next foothold. We demonstrate our proprioception and reaction strategy enables traversal of entanglements of many stiffnesses and geometries succeeding in 14 out of 16 trials in laboratory tests, as well as a natural outdoor environment.
△ Less
Submitted 9 September, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Grounding Robot Navigation in Self-Defense Law
Authors:
James Zhu,
Anoushka Shrivastava,
Aaron M. Johnson
Abstract:
Robots operating in close proximity to humans rely heavily on human trust to successfully complete their tasks. But what are the real outcomes when this trust is violated? Self-defense law provides a framework for analyzing tangible failure scenarios that can inform the design of robots and their algorithms. Studying self-defense is particularly important for ground robots since they operate withi…
▽ More
Robots operating in close proximity to humans rely heavily on human trust to successfully complete their tasks. But what are the real outcomes when this trust is violated? Self-defense law provides a framework for analyzing tangible failure scenarios that can inform the design of robots and their algorithms. Studying self-defense is particularly important for ground robots since they operate within public environments, where they can pose a legitimate threat to the safety of nearby humans. Moreover, even if ground robots can guarantee human safety, the perception of a physical threat is sufficient to justify human self-defense against robots. In this paper, we synthesize works in law, engineering, and social science to present four actionable recommendations for how the robotics community can craft robots to mitigate the likelihood of self-defense situations arising. We establish how current U.S. self-defense law can justify a human protecting themselves against a robot, discuss the current literature on human attitudes toward robots, and analyze methods that have been produced to allow robots to operate close to humans. Finally, we present hypothetical scenarios that underscore how current robot navigation methods can fail to sufficiently consider self-defense concerns and the need for the recommendations to guide improvements in the field.
△ Less
Submitted 26 June, 2023; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Convergent iLQR for Safe Trajectory Planning and Control of Legged Robots
Authors:
James Zhu,
J. Joe Payne,
Aaron M. Johnson
Abstract:
In order to perform highly dynamic and agile maneuvers, legged robots typically spend time in underactuated domains (e.g. with feet off the ground) where the system has limited command of its acceleration and a constrained amount of time before transitioning to a new domain (e.g. foot touchdown). Meanwhile, these transitions can instantaneously change the system's state, possibly causing perturbat…
▽ More
In order to perform highly dynamic and agile maneuvers, legged robots typically spend time in underactuated domains (e.g. with feet off the ground) where the system has limited command of its acceleration and a constrained amount of time before transitioning to a new domain (e.g. foot touchdown). Meanwhile, these transitions can instantaneously change the system's state, possibly causing perturbations to be mapped arbitrarily far away from the target trajectory. These properties make it difficult for local feedback controllers to effectively recover from disturbances as the system evolves through underactuated domains and hybrid impact events. To address this, we utilize the fundamental solution matrix that characterizes the evolution of perturbations through a hybrid trajectory and its 2-norm, which represents the worst-case growth of perturbations. In this paper, the worst-case perturbation analysis is used to explicitly reason about the tracking performance of a hybrid trajectory and is incorporated in an iLQR framework to optimize a trajectory while taking into account the closed-loop convergence of the trajectory under an LQR tracking controller. The generated convergent trajectories recover more effectively from perturbations, are more robust to large disturbances, and use less feedback control effort than trajectories generated with traditional methods.
△ Less
Submitted 4 March, 2024; v1 submitted 1 April, 2023;
originally announced April 2023.
-
Proprioception and Tail Control Enable Extreme Terrain Traversal by Quadruped Robots
Authors:
Yanhao Yang,
Joseph Norby,
Justin K. Yim,
Aaron M. Johnson
Abstract:
Legged robots leverage ground contacts and the reaction forces they provide to achieve agile locomotion. However, uncertainty coupled with contact discontinuities can lead to failure, especially in real-world environments with unexpected height variations such as rocky hills or curbs. To enable dynamic traversal of extreme terrain, this work introduces 1) a proprioception-based gait planner for es…
▽ More
Legged robots leverage ground contacts and the reaction forces they provide to achieve agile locomotion. However, uncertainty coupled with contact discontinuities can lead to failure, especially in real-world environments with unexpected height variations such as rocky hills or curbs. To enable dynamic traversal of extreme terrain, this work introduces 1) a proprioception-based gait planner for estimating unknown hybrid events due to elevation changes and responding by modifying contact schedules and planned footholds online, and 2) a two-degree-of-freedom tail for improving contact-independent control and a corresponding decoupled control scheme for better versatility and efficiency. Simulation results show that the gait planner significantly improves stability under unforeseen terrain height changes compared to methods that assume fixed contact schedules and footholds. Further, tests have shown that the tail is particularly effective at maintaining stability when encountering a terrain change with an initial angular disturbance. The results show that these approaches work synergistically to stabilize locomotion with elevation changes up to 1.5 times the leg length and tilted initial states.
△ Less
Submitted 8 September, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Conflict-Based Model Predictive Control for Scalable Multi-Robot Motion Planning
Authors:
Ardalan Tajbakhsh,
Lorenz T. Biegler,
Aaron M. Johnson
Abstract:
This paper presents a scalable multi-robot motion planning algorithm called Conflict-Based Model Predictive Control (CB-MPC). Inspired by Conflict-Based Search (CBS), the planner leverages a similar high-level conflict tree to efficiently resolve robot-robot conflicts in the continuous space, while reasoning about each agent's kinematic and dynamic constraints and actuation limits using MPC as the…
▽ More
This paper presents a scalable multi-robot motion planning algorithm called Conflict-Based Model Predictive Control (CB-MPC). Inspired by Conflict-Based Search (CBS), the planner leverages a similar high-level conflict tree to efficiently resolve robot-robot conflicts in the continuous space, while reasoning about each agent's kinematic and dynamic constraints and actuation limits using MPC as the low-level planner. We show that tracking high-level multi-robot plans with a vanilla MPC controller is insufficient, and results in unexpected collisions in tight navigation scenarios. Compared to other variations of multi-robot MPC like joint, prioritized, and distributed, we demonstrate that CB-MPC improves the executability and success rate, allows for closer robot-robot interactions, and reduces the computational cost significantly without compromising the solution quality across a variety of environments. Furthermore, we show that CB-MPC combined with a high-level path planner can effectively substitute computationally expensive full-horizon multi-robot kinodynamic planners.
△ Less
Submitted 1 April, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Parma: Confidential Containers via Attested Execution Policies
Authors:
Matthew A. Johnson,
Stavros Volos,
Ken Gordon,
Sean T. Allen,
Christoph M. Wintersteiger,
Sylvan Clebsch,
John Starks,
Manuel Costa
Abstract:
Container-based technologies empower cloud tenants to develop highly portable software and deploy services in the cloud at a rapid pace. Cloud privacy, meanwhile, is important as a large number of container deployments operate on privacy-sensitive data, but challenging due to the increasing frequency and sophistication of attacks. State-of-the-art confidential container-based designs leverage proc…
▽ More
Container-based technologies empower cloud tenants to develop highly portable software and deploy services in the cloud at a rapid pace. Cloud privacy, meanwhile, is important as a large number of container deployments operate on privacy-sensitive data, but challenging due to the increasing frequency and sophistication of attacks. State-of-the-art confidential container-based designs leverage process-based trusted execution environments (TEEs), but face security and compatibility issues that limits their practical deployment. We propose Parma, an architecture that provides lift-and-shift deployment of unmodified containers while providing strong security protection against a powerful attacker who controls the untrusted host and hypervisor. Parma leverages VM-level isolation to execute a container group within a unique VM-based TEE. Besides container integrity and user data confidentiality and integrity, Parma also offers container attestation and execution integrity based on an attested execution policy. Parma execution policies provide an inductive proof over all future states of the container group. This proof, which is established during initialization, forms a root of trust that can be used for secure operations within the container group without requiring any modifications of the containerized workflow itself (aside from the inclusion of the execution policy.) We evaluate Parma on AMD SEV-SNP processors by running a diverse set of workloads demonstrating that workflows exhibit 0-26% additional overhead in performance over running outside the enclave, with a mean 13% overhead on SPEC2017, while requiring no modifications to their program code. Adding execution policies introduces less than 1% additional overhead. Furthermore, we have deployed Parma as the underlying technology driving Confidential Containers on Azure Container Instances.
△ Less
Submitted 7 March, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
The unreasonable effectiveness of few-shot learning for machine translation
Authors:
Xavier Garcia,
Yamini Bansal,
Colin Cherry,
George Foster,
Maxim Krikun,
Fangxiaoyu Feng,
Melvin Johnson,
Orhan Firat
Abstract:
We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general…
▽ More
We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
A deep learning approach to using wearable seismocardiography (SCG) for diagnosing aortic valve stenosis and predicting aortic hemodynamics obtained by 4D flow MRI
Authors:
Mahmoud E. Khani,
Ethan M. I. Johnson,
Aparna Sodhi,
Joshua Robinson,
Cynthia K. Rigsby,
Bradly D. Allen,
Michael Markl
Abstract:
In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated…
▽ More
In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated peak systolic velocity Vmax in patients with heart valve diseases, from SCG signals. We also investigated the ability of this deep learning technique to differentiate between patients diagnosed with aortic valve stenosis (AS), non-AS patients with a bicuspid aortic valve (BAV), non-AS patients with a mechanical aortic valve (MAV), and healthy subjects with a normal tricuspid aortic valve (TAV). In a study of 77 subjects who underwent same-day 4D flow MRI and SCG, we found that the Vmax values obtained using deep learning and SCGs were in good agreement with those obtained by 4D flow MRI. Additionally, subjects with TAV, BAV, MAV, and AS could be classified with ROC-AUC values of 92%, 95%, 81%, and 83%, respectively. This suggests that SCG obtained using low-cost wearable electronics may be used as a supplement to 4D flow MRI exams or as a screening tool for aortic valve disease.
△ Less
Submitted 5 January, 2023;
originally announced January 2023.
-
Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Authors:
Yong Cheng,
Yu Zhang,
Melvin Johnson,
Wolfgang Macherey,
Ankur Bapna
Abstract:
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-t…
▽ More
We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling (MLM) objective on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker sequence-to-sequence architecture. On text understanding tasks, our model improves by more than 6\% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
△ Less
Submitted 26 June, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Reacting to Contact: Transparency and Collision Reflex in Actuation
Authors:
Ankit Bhatia,
Matthew T. Mason,
Aaron M. Johnson
Abstract:
In unstructured environments, robots run the risk of unexpected collisions. How well they react to these events is determined by how transparent they are to collisions. Transparency is affected by structural properties as well as sensing and control architectures. In this paper, we propose the collision reflex metric as a way to formally quantify transparency. It is defined as the total impulse tr…
▽ More
In unstructured environments, robots run the risk of unexpected collisions. How well they react to these events is determined by how transparent they are to collisions. Transparency is affected by structural properties as well as sensing and control architectures. In this paper, we propose the collision reflex metric as a way to formally quantify transparency. It is defined as the total impulse transferred in collision, which determines the collision mitigation capabilities of a closed-loop robotic system taking into account structure, sensing, and control. We analyze the effect of motor scaling, stiffness, and configuration on the collision reflex of a system using an analytical model. Physical experiments using the move-until-touch behavior are conducted to compare the collision reflex of direct-drive and quasi-direct-drive actuators and robotic hands (Schunk WSG-50 and Dexterous DDHand.) For transparent systems, we see a counter-intuitive trend: the impulse may be lower at higher pre-impact velocities.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Complexity Framework For Forbidden Subgraphs I: The Framework
Authors:
Matthew Johnson,
Barnaby Martin,
Jelle J. Oostveen,
Sukanya Pandey,
Daniël Paulusma,
Siani Smith,
Erik Jan van Leeuwen
Abstract:
For any particular class of graphs, algorithms for computational problems restricted to the class often rely on structural properties that depend on the specific problem at hand. This begs the question if a large set of such results can be explained by some common problem conditions. We propose such conditions for $HH$-subgraph-free graphs. For a set of graphs $HH$, a graph $G$ is $HH$-subgraph-fr…
▽ More
For any particular class of graphs, algorithms for computational problems restricted to the class often rely on structural properties that depend on the specific problem at hand. This begs the question if a large set of such results can be explained by some common problem conditions. We propose such conditions for $HH$-subgraph-free graphs. For a set of graphs $HH$, a graph $G$ is $HH$-subgraph-free if $G$ does not contain any of graph from $H$ as a subgraph. Our conditions are easy to state. A graph problem must be efficiently solvable on graphs of bounded treewidth, computationally hard on subcubic graphs, and computational hardness must be preserved under edge subdivision of subcubic graphs. Our meta-classification says that if a graph problem satisfies all three conditions, then for every finite set $HH$, it is ``efficiently solvable'' on $HH$-subgraph-free graphs if $HH$ contains a disjoint union of one or more paths and subdivided claws, and is ``computationally hard'' otherwise. We illustrate the broad applicability of our meta-classification by obtaining a dichotomy between polynomial-time solvability and NP-completeness for many well-known partitioning, covering and packing problems, network design problems and width parameter problems. For other problems, we obtain a dichotomy between almost-linear-time solvability and having no subquadratic-time algorithm (conditioned on some hardness hypotheses). The proposed framework thus gives a simple pathway to determine the complexity of graph problems on $HH$-subgraph-free graphs. This is confirmed even more by the fact that along the way, we uncover and resolve several open questions from the literature.
△ Less
Submitted 20 July, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Edge Multiway Cut and Node Multiway Cut are NP-complete on subcubic graphs
Authors:
Matthew Johnson,
Barnaby Martin,
Siani Smith,
Sukanya Pandey,
Daniel Paulusma,
Erik Jan van Leeuwen
Abstract:
We show that Edge Multiway Cut (also called Multiterminal Cut) and Node Multiway Cut are NP-complete on graphs of maximum degree $3$ (also known as subcubic graphs). This improves on a previous degree bound of $11$. Our NP-completeness result holds even for subcubic graphs that are planar.
We show that Edge Multiway Cut (also called Multiterminal Cut) and Node Multiway Cut are NP-complete on graphs of maximum degree $3$ (also known as subcubic graphs). This improves on a previous degree bound of $11$. Our NP-completeness result holds even for subcubic graphs that are planar.
△ Less
Submitted 9 February, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Photo-realistic 360 Head Avatars in the Wild
Authors:
Stanislaw Szymanowicz,
Virginia Estellers,
Tadas Baltrusaitis,
Matthew Johnson
Abstract:
Delivering immersive, 3D experiences for human communication requires a method to obtain 360 degree photo-realistic avatars of humans. To make these experiences accessible to all, only commodity hardware, like mobile phone cameras, should be necessary to capture the data needed for avatar creation. For avatars to be rendered realistically from any viewpoint, we require training images and camera p…
▽ More
Delivering immersive, 3D experiences for human communication requires a method to obtain 360 degree photo-realistic avatars of humans. To make these experiences accessible to all, only commodity hardware, like mobile phone cameras, should be necessary to capture the data needed for avatar creation. For avatars to be rendered realistically from any viewpoint, we require training images and camera poses from all angles. However, we cannot rely on there being trackable features in the foreground or background of all images for use in estimating poses, especially from the side or back of the head. To overcome this, we propose a novel landmark detector trained on synthetic data to estimate camera poses from 360 degree mobile phone videos of a human head for use in a multi-stage optimization process which creates a photo-realistic avatar. We perform validation experiments with synthetic data and showcase our method on 360 degree avatars trained from mobile phone videos.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
MRI-MECH: Mechanics-informed MRI to estimate esophageal health
Authors:
Sourav Halder,
Ethan M. Johnson,
Jun Yamasaki,
Peter J. Kahrilas,
Michael Markl,
John E. Pandolfino,
Neelesh A. Patankar
Abstract:
Dynamic magnetic resonance imaging (MRI) is a popular medical imaging technique to generate image sequences of the flow of a contrast material inside tissues and organs. However, its application to imaging bolus movement through the esophagus has only been demonstrated in few feasibility studies and is relatively unexplored. In this work, we present a computational framework called mechanics-infor…
▽ More
Dynamic magnetic resonance imaging (MRI) is a popular medical imaging technique to generate image sequences of the flow of a contrast material inside tissues and organs. However, its application to imaging bolus movement through the esophagus has only been demonstrated in few feasibility studies and is relatively unexplored. In this work, we present a computational framework called mechanics-informed MRI (MRI-MECH) that enhances that capability thereby increasing the applicability of dynamic MRI for diagnosing esophageal disorders. Pineapple juice was used as the swallowed contrast material for the dynamic MRI and the MRI image sequence was used as input to the MRI-MECH. The MRI-MECH modeled the esophagus as a flexible one-dimensional tube and the elastic tube walls followed a linear tube law. Flow through the esophagus was then governed by one-dimensional mass and momentum conservation equations. These equations were solved using a physics-informed neural network (PINN). The PINN minimized the difference between the measurements from the MRI and model predictions ensuring that the physics of the fluid flow problem was always followed. MRI-MECH calculated the fluid velocity and pressure during esophageal transit and estimated the mechanical health of the esophagus by calculating wall stiffness and active relaxation. Additionally, MRI-MECH predicted missing information about the lower esophageal sphincter during the emptying process, demonstrating its applicability to scenarios with missing data or poor image resolution. In addition to potentially improving clinical decisions based on quantitative estimates of the mechanical health of the esophagus, MRI-MECH can also be enhanced for application to other medical imaging modalities to enhance their functionality as well.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Adaptive Complexity Model Predictive Control
Authors:
Joseph Norby,
Ardalan Tajbakhsh,
Yanhao Yang,
Aaron M. Johnson
Abstract:
This work introduces a formulation of model predictive control (MPC) which adaptively reasons about the complexity of the model based on the task while maintaining feasibility and stability guarantees. Existing MPC implementations often handle computational complexity by shortening prediction horizons or simplifying models, both of which can result in instability. Inspired by related approaches in…
▽ More
This work introduces a formulation of model predictive control (MPC) which adaptively reasons about the complexity of the model based on the task while maintaining feasibility and stability guarantees. Existing MPC implementations often handle computational complexity by shortening prediction horizons or simplifying models, both of which can result in instability. Inspired by related approaches in behavioral economics, motion planning, and biomechanics, our method solves MPC problems with a simple model for dynamics and constraints over regions of the horizon where such a model is feasible and a complex model where it is not. The approach leverages an interleaving of planning and execution to iteratively identify these regions, which can be safely simplified if they satisfy an exact template/anchor relationship. We show that this method does not compromise the stability and feasibility properties of the system, and measure performance in simulation experiments on a quadrupedal robot executing agile behaviors over terrains of interest. We find that this adaptive method enables more agile motion and expands the range of executable tasks compared to fixed-complexity implementations.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
VolTeMorph: Realtime, Controllable and Generalisable Animation of Volumetric Representations
Authors:
Stephan J. Garbin,
Marek Kowalski,
Virginia Estellers,
Stanislaw Szymanowicz,
Shideh Rezaeifar,
**g**g Shen,
Matthew Johnson,
Julien Valentin
Abstract:
The recent increase in popularity of volumetric representations for scene reconstruction and novel view synthesis has put renewed focus on animating volumetric content at high visual quality and in real-time. While implicit deformation methods based on learned functions can produce impressive results, they are `black boxes' to artists and content creators, they require large amounts of training da…
▽ More
The recent increase in popularity of volumetric representations for scene reconstruction and novel view synthesis has put renewed focus on animating volumetric content at high visual quality and in real-time. While implicit deformation methods based on learned functions can produce impressive results, they are `black boxes' to artists and content creators, they require large amounts of training data to generalise meaningfully, and they do not produce realistic extrapolations outside the training data. In this work we solve these issues by introducing a volume deformation method which is real-time, easy to edit with off-the-shelf software and can extrapolate convincingly. To demonstrate the versatility of our method, we apply it in two scenarios: physics-based object deformation and telepresence where avatars are controlled using blendshapes. We also perform thorough experiments showing that our method compares favourably to both volumetric approaches combined with implicit deformation and methods based on mesh deformation.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Smoothing Entailment Graphs with Language Models
Authors:
Nick McKenna,
Tianyi Li,
Mark Johnson,
Mark Steedman
Abstract:
The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE). EGs are computationally efficient and explainable models of natural language inference, but as symbolic models, they fail if a novel premise or hypothesis vertex is missing at test-time. We present theory and methodology fo…
▽ More
The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE). EGs are computationally efficient and explainable models of natural language inference, but as symbolic models, they fail if a novel premise or hypothesis vertex is missing at test-time. We present theory and methodology for overcoming such sparsity in symbolic models. First, we introduce a theory of optimal smoothing of EGs by constructing transitive chains. We then demonstrate an efficient, open-domain, and unsupervised smoothing method using an off-the-shelf Language Model to find approximations of missing premise predicates. This improves recall by 25.1 and 16.3 percentage points on two difficult directional entailment datasets, while raising average precision and maintaining model explainability. Further, in a QA task we show that EG smoothing is most useful for answering questions with lesser supporting text, where missing premise predicates are more costly. Finally, controlled experiments with WordNet confirm our theory and show that hypothesis smoothing is difficult, but possible in principle.
△ Less
Submitted 21 September, 2023; v1 submitted 30 July, 2022;
originally announced August 2022.
-
Towards Map** and Assessing Sidewalk Accessibility Across Sociocultural and Geographic Contexts
Authors:
Jon E. Froehlich,
Michael Saugstad,
Manaswi Saha,
Matthew Johnson
Abstract:
Despite the important role of sidewalks in supporting mobility, accessibility, and public health, there is a lack of high-quality datasets and corresponding analyses on sidewalk existence and condition. Our work explores a twofold vision: first, to develop scalable mechanisms to locate and assess sidewalks in cities across the world, and second, to use this data to support new urban analyses and m…
▽ More
Despite the important role of sidewalks in supporting mobility, accessibility, and public health, there is a lack of high-quality datasets and corresponding analyses on sidewalk existence and condition. Our work explores a twofold vision: first, to develop scalable mechanisms to locate and assess sidewalks in cities across the world, and second, to use this data to support new urban analyses and mobility tools. We report on two preliminary urban science explorations enabled by our approach: exploring geo-spatial patterns and key correlates of sidewalk accessibility and examining differences in sidewalk infrastructure across regions.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Environmental Sampling with the Boustrophedon Decomposition Algorithm
Authors:
Hannah He,
Joe Norby,
Sean Wang,
Natasha Sihota,
Thomas P. Hoelen,
Gregory V. Lowry,
Aaron M. Johnson
Abstract:
The automation of data collection via mobile robots holds promise for increasing the efficacy of environmental investigations, but requires the system to autonomously determine how to sample the environment while avoiding obstacles. Existing methods such as the boustrophedon decomposition algorithm enable complete coverage of the environment to a specified resolution, yet in many cases sampling at…
▽ More
The automation of data collection via mobile robots holds promise for increasing the efficacy of environmental investigations, but requires the system to autonomously determine how to sample the environment while avoiding obstacles. Existing methods such as the boustrophedon decomposition algorithm enable complete coverage of the environment to a specified resolution, yet in many cases sampling at the resolution of the distribution would yield long paths with an infeasible number of measurements. Downsampling these paths can result in feasible plans at the expense of distribution estimation accuracy. This work explores this tradeoff between distribution accuracy and path length for the boustrophedon decomposition algorithm. We quantify algorithm performance by computing metrics for accuracy and path length in a Monte-Carlo simulation across a distribution of environments. We highlight conditions where one objective should be prioritized over the other and propose a modification to the algorithm to improve its effectiveness by sampling more uniformly. These results demonstrate how intelligent deployment of the boustrophedon algorithm can effectively guide autonomous environmental sampling.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Hybrid iLQR Model Predictive Control for Contact Implicit Stabilization on Legged Robots
Authors:
Nathan J. Kong,
Chuanzheng Li,
Aaron M. Johnson
Abstract:
Model Predictive Control (MPC) is a popular strategy for controlling robots but is difficult for systems with contact due to the complex nature of hybrid dynamics. To implement MPC for systems with contact, dynamic models are often simplified or contact sequences fixed in time in order to plan trajectories efficiently. In this work, we extend Hybrid iterative Linear Quadratic Regulator to work in…
▽ More
Model Predictive Control (MPC) is a popular strategy for controlling robots but is difficult for systems with contact due to the complex nature of hybrid dynamics. To implement MPC for systems with contact, dynamic models are often simplified or contact sequences fixed in time in order to plan trajectories efficiently. In this work, we extend Hybrid iterative Linear Quadratic Regulator to work in a MPC fashion (HiLQR MPC) by 1) modifying how the cost function is computed when contact modes do not align, 2) utilizing parallelizations when simulating rigid body dynamics, and 3) using efficient analytical derivative computations of the rigid body dynamics. The result is a system that can modify the contact sequence of the reference behavior and plan whole body motions cohesively -- which is crucial when dealing with large perturbations. HiLQR MPC is tested on two systems: first, the hybrid cost modification is validated on a simple actuated bouncing ball hybrid system. Then HiLQR MPC is compared against methods that utilize centroidal dynamic assumptions on a quadruped robot (Unitree A1). HiLQR MPC outperforms the centroidal methods in both simulation and hardware tests.
△ Less
Submitted 6 November, 2023; v1 submitted 10 July, 2022;
originally announced July 2022.
-
Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model
Authors:
Erik C. M. Johnson,
Marc Habermann,
Soshi Shimada,
Vladislav Golyanik,
Christian Theobalt
Abstract:
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications. However, current approaches suffer from drawbacks such as struggling with large scene deformations, inaccurate shape completion or requiring 2D point tracks. In contrast, our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can opera…
▽ More
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications. However, current approaches suffer from drawbacks such as struggling with large scene deformations, inaccurate shape completion or requiring 2D point tracks. In contrast, our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering. This technique includes three new in the context of non-rigid 3D reconstruction components, i.e., 1) A coordinate-based and implicit neural representation for non-rigid scenes, which in conjunction with differentiable volume rendering enables an unbiased reconstruction of dynamic scenes, 2) a proof that extends the unbiased formulation of volume rendering to dynamic scenes, and 3) a novel dynamic scene flow loss, which enables the reconstruction of larger deformations by leveraging the coarse estimates of other methods. Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
△ Less
Submitted 4 May, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Persistent Homology for Resource Coverage: A Case Study of Access to Polling Sites
Authors:
Abigail Hickok,
Benjamin Jarman,
Michael Johnson,
Jiajie Luo,
Mason A. Porter
Abstract:
It is important to choose the geographical distributions of public resources in a fair and equitable manner. However, it is complicated to quantify the equity of such a distribution; important factors include distances to resource sites, availability of transportation, and ease of travel. We use persistent homology, which is a tool from topological data analysis, to study the effective availabilit…
▽ More
It is important to choose the geographical distributions of public resources in a fair and equitable manner. However, it is complicated to quantify the equity of such a distribution; important factors include distances to resource sites, availability of transportation, and ease of travel. We use persistent homology, which is a tool from topological data analysis, to study the effective availability and coverage of polling sites. The information from persistent homology allows us to infer holes in the distribution of polling sites. We analyze and compare the coverage of polling sites in Los Angeles County and five cities (Atlanta, Chicago, Jacksonville, New York City, and Salt Lake City), and we conclude that computation of persistent homology appears to be a reasonable approach to analyzing resource coverage.
△ Less
Submitted 11 August, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
TartanDrive: A Large-Scale Dataset for Learning Off-Road Dynamics Models
Authors:
Samuel Triest,
Matthew Sivaprakasam,
Sean J. Wang,
Wenshan Wang,
Aaron M. Johnson,
Sebastian Scherer
Abstract:
We present TartanDrive, a large scale dataset for learning dynamics models for off-road driving. We collected a dataset of roughly 200,000 off-road driving interactions on a modified Yamaha Viking ATV with seven unique sensing modalities in diverse terrains. To the authors' knowledge, this is the largest real-world multi-modal off-road driving dataset, both in terms of number of interactions and s…
▽ More
We present TartanDrive, a large scale dataset for learning dynamics models for off-road driving. We collected a dataset of roughly 200,000 off-road driving interactions on a modified Yamaha Viking ATV with seven unique sensing modalities in diverse terrains. To the authors' knowledge, this is the largest real-world multi-modal off-road driving dataset, both in terms of number of interactions and sensing modalities. We also benchmark several state-of-the-art methods for model-based reinforcement learning from high-dimensional observations on this dataset. We find that extending these models to multi-modality leads to significant performance on off-road dynamics prediction, especially in more challenging terrains. We also identify some shortcomings with current neural network architectures for the off-road driving task. Our dataset is available at https://github.com/castacks/tartan_drive.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
You Only Linearize Once: Tangents Transpose to Gradients
Authors:
Alexey Radul,
Adam Paszke,
Roy Frostig,
Matthew Johnson,
Dougal Maclaurin
Abstract:
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzip** the…
▽ More
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzip** the linear and non-linear parts and then (iii) transposition of the linear part.
To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzip** `let` expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD.
We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).
△ Less
Submitted 6 December, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
3D face reconstruction with dense landmarks
Authors:
Erroll Wood,
Tadas Baltrusaitis,
Charlie Hewitt,
Matthew Johnson,
**g**g Shen,
Nikola Milosavljevic,
Daniel Wilde,
Stephan Garbin,
Chirag Raman,
Jamie Shotton,
Toby Sharp,
Ivan Stojiljkovic,
Tom Cashman,
Julien Valentin
Abstract:
Landmarks often play a key role in face analysis, but many aspects of identity or expression cannot be represented by sparse landmarks alone. Thus, in order to reconstruct faces more accurately, landmarks are often combined with additional signals like depth images or techniques like differentiable rendering. Can we keep things simple by just using more landmarks? In answer, we present the first m…
▽ More
Landmarks often play a key role in face analysis, but many aspects of identity or expression cannot be represented by sparse landmarks alone. Thus, in order to reconstruct faces more accurately, landmarks are often combined with additional signals like depth images or techniques like differentiable rendering. Can we keep things simple by just using more landmarks? In answer, we present the first method that accurately predicts 10x as many landmarks as usual, covering the whole head, including the eyes and teeth. This is accomplished using synthetic training data, which guarantees perfect landmark annotations. By fitting a morphable model to these dense landmarks, we achieve state-of-the-art results for monocular 3D face reconstruction in the wild. We show that dense landmarks are an ideal signal for integrating face shape information across frames by demonstrating accurate and expressive facial performance capture in both monocular and multi-view scenarios. This approach is also highly efficient: we can predict dense landmarks and fit our 3D face model at over 150FPS on a single CPU thread. Please see our website: https://microsoft.github.io/DenseLandmarks/.
△ Less
Submitted 20 July, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.