-
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Authors:
Enhui Ma,
Lijun Zhou,
Tao Tang,
Zhan Zhang,
Dong Han,
Junpeng Jiang,
Kun Zhan,
Peng Jia,
Xianpeng Lang,
Haiyang Sun,
Di Lin,
Kaicheng Yu
Abstract:
Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and tempora…
▽ More
Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency. This is achieved by building a failure-case driven framework with the help of pre-trained visual language models. Our extensive experiment demonstrates that our Delphi generates a higher quality of long videos surpassing previous state-of-the-art methods. Consequentially, with only generating 4% of the training dataset size, our framework is able to go beyond perception and prediction tasks, for the first time to the best of our knowledge, boost the planning performance of the end-to-end autonomous driving model by a margin of 25%.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
BOLD v4: A Centralized Bioinformatics Platform for DNA-based Biodiversity Data
Authors:
Sujeevan Ratnasingham,
Catherine Wei,
Dean Chan,
Jireh Agda,
Josh Agda,
Liliana Ballesteros-Mejia,
Hamza Ait Boutou,
Zak Mohammad El Bastami,
Eddie Ma,
Ramya Manjunath,
Dana Rea,
Chris Ho,
Angela Telfer,
Jaclyn McKeowan,
Miduna Rahulan,
Claudia Steinke,
Justin Dorsheimer,
Megan Milton,
Paul D. N. Hebert
Abstract:
BOLD, the Barcode of Life Data System, supports the acquisition, storage, validation, analysis, and publication of DNA barcodes, activities requiring the integration of molecular, morphological, and distributional data. Its pivotal role in curating the reference library of DNA barcodes, coupled with its data management and analysis capabilities, make it a central resource for biodiversity science.…
▽ More
BOLD, the Barcode of Life Data System, supports the acquisition, storage, validation, analysis, and publication of DNA barcodes, activities requiring the integration of molecular, morphological, and distributional data. Its pivotal role in curating the reference library of DNA barcodes, coupled with its data management and analysis capabilities, make it a central resource for biodiversity science. It enables rapid, accurate identification of specimens and also reveals patterns of genetic diversity and evolutionary relationships among taxa. Launched in 2005, BOLD has become an increasingly powerful tool for advancing understanding of planetary biodiversity. It currently hosts 17 million specimen records and 14 million barcodes that provide coverage for more than a million species from every continent and ocean. The platform has the long-term goal of providing a consistent, accurate system for identifying all species of eukaryotes. BOLD's integrated analytical tools, full data lifecycle support, and secure collaboration framework distinguish it from other biodiversity platforms. BOLD v4 brought enhanced data management and analysis capabilities as well as novel functionality for data dissemination and publication. Its next version will include features to strengthen its utility to the research community, governments, industry, and society-at-large.
△ Less
Submitted 5 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Authors:
Khaoula Chehbouni,
Megha Roshan,
Emmanuel Ma,
Futian Andrew Wei,
Afaf Taik,
Jackie CK Cheung,
Golnoosh Farnadi
Abstract:
Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging saf…
▽ More
Recent progress in large language models (LLMs) has led to their widespread adoption in various domains. However, these advancements have also introduced additional safety risks and raised concerns regarding their detrimental impact on already marginalized populations. Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain. Furthermore, previous work has demonstrated that models optimized for safety often display exaggerated safety behaviors, such as a tendency to refrain from responding to certain requests as a precautionary measure. As such, a clear trade-off between the helpfulness and safety of these models has been documented in the literature. In this paper, we further investigate the effectiveness of safety measures by evaluating models on already mitigated biases. Using the case of Llama 2 as an example, we illustrate how LLMs' safety responses can still encode harmful assumptions. To do so, we create a set of non-toxic prompts, which we then use to evaluate Llama models. Through our new taxonomy of LLMs responses to users, we observe that the safety/helpfulness trade-offs are more pronounced for certain demographic groups which can lead to quality-of-service harms for marginalized populations.
△ Less
Submitted 7 June, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling
Authors:
Hang Jiang,
Xiajie Zhang,
Robert Mahari,
Daniel Kessler,
Eric Ma,
Tal August,
Irene Li,
Alex 'Sandy' Pentland,
Yoon Kim,
Deb Roy,
Jad Kabbara
Abstract:
Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts throug…
▽ More
Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts through storytelling, an effective pedagogical tool in conveying complex and abstract concepts. We also introduce a new dataset LegalStories, which consists of 294 complex legal doctrines, each accompanied by a story and a set of multiple-choice questions generated by LLMs. To construct the dataset, we experiment with various LLMs to generate legal stories explaining these concepts. Furthermore, we use an expert-in-the-loop approach to iteratively design multiple-choice questions. Then, we evaluate the effectiveness of storytelling with LLMs through randomized controlled trials (RCTs) with legal novices on 10 samples from the dataset. We find that LLM-generated stories enhance comprehension of legal concepts and interest in law among non-native speakers compared to only definitions. Moreover, stories consistently help participants relate legal concepts to their lives. Finally, we find that learning with stories shows a higher retention rate for non-native speakers in the follow-up assessment. Our work has strong implications for using LLMs in promoting teaching and learning in the legal field and beyond.
△ Less
Submitted 2 July, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Identifiability of Product of Experts Models
Authors:
Spencer L. Gordon,
Manav Kant,
Eric Ma,
Leonard J. Schulman,
Andrei Staicu
Abstract:
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a v…
▽ More
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a variety of applications in learning.
We study the problem of identifiability of a product of experts model having a layer of binary latent variables, and a layer of binary observables that are iid conditional on the latents. The previous best upper bound on the number of observables needed to identify the model was exponential in the number of parameters. We show: (a) When the latents are uniformly distributed, the model is identifiable with a number of observables equal to the number of parameters (and hence best possible). (b) In the more general case of arbitrarily distributed latents, the model is identifiable for a number of observables that is still linear in the number of parameters (and within a factor of two of best-possible). The proofs rely on root interlacing phenomena for some special three-term recurrences.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout
Authors:
Kairui Yang,
Enhui Ma,
Jibin Peng,
Qing Guo,
Di Lin,
Kaicheng Yu
Abstract:
Using synthesized images to boost the performance of perception models is a long-standing research challenge in computer vision. It becomes more eminent in visual-centric autonomous driving systems with multi-view cameras as some long-tail scenarios can never be collected. Guided by the BEV segmentation layouts, the existing generative networks seem to synthesize photo-realistic street-view images…
▽ More
Using synthesized images to boost the performance of perception models is a long-standing research challenge in computer vision. It becomes more eminent in visual-centric autonomous driving systems with multi-view cameras as some long-tail scenarios can never be collected. Guided by the BEV segmentation layouts, the existing generative networks seem to synthesize photo-realistic street-view images when evaluated solely on scene-level metrics. However, once zoom-in, they usually fail to produce accurate foreground and background details such as heading. To this end, we propose a two-stage generative method, dubbed BEVControl, that can generate accurate foreground and background contents. In contrast to segmentation-like input, it also supports sketch style input, which is more flexible for humans to edit. In addition, we propose a comprehensive multi-level evaluation protocol to fairly compare the quality of the generated scene, foreground object, and background geometry. Our extensive experiments show that our BEVControl surpasses the state-of-the-art method, BEVGen, by a significant margin, from 5.89 to 26.80 on foreground segmentation mIoU. In addition, we show that using images generated by BEVControl to train the downstream perception model, it achieves on average 1.29 improvement in NDS score.
△ Less
Submitted 23 September, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion
Authors:
Haotian Dong,
Enhui Ma,
Lubo Wang,
Miaohui Wang,
Wuyuan Xie,
Qing Guo,
** Li,
Lingyu Liang,
Kairui Yang,
Di Lin
Abstract:
Semantic scene completion (SSC) requires an accurate understanding of the geometric and semantic relationships between the objects in the 3D scene for reasoning the occluded objects. The popular SSC methods voxelize the 3D objects, allowing the deep 3D convolutional network (3D CNN) to learn the object relationships from the complex scenes. However, the current networks lack the controllable kerne…
▽ More
Semantic scene completion (SSC) requires an accurate understanding of the geometric and semantic relationships between the objects in the 3D scene for reasoning the occluded objects. The popular SSC methods voxelize the 3D objects, allowing the deep 3D convolutional network (3D CNN) to learn the object relationships from the complex scenes. However, the current networks lack the controllable kernels to model the object relationship across multiple views, where appropriate views provide the relevant information for suggesting the existence of the occluded objects. In this paper, we propose Cross-View Synthesis Transformer (CVSformer), which consists of Multi-View Feature Synthesis and Cross-View Transformer for learning cross-view object relationships. In the multi-view feature synthesis, we use a set of 3D convolutional kernels rotated differently to compute the multi-view features for each voxel. In the cross-view transformer, we employ the cross-view fusion to comprehensively learn the cross-view relationships, which form useful information for enhancing the features of individual views. We use the enhanced features to predict the geometric occupancies and semantic labels of all voxels. We evaluate CVSformer on public datasets, where CVSformer yields state-of-the-art results.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Investigating Masking-based Data Generation in Language Models
Authors:
Ed S. Ma
Abstract:
The current era of natural language processing (NLP) has been defined by the prominence of pre-trained language models since the advent of BERT. A feature of BERT and models with similar architecture is the objective of masked language modeling, in which part of the input is intentionally masked and the model is trained to predict this piece of masked information. Data augmentation is a data-drive…
▽ More
The current era of natural language processing (NLP) has been defined by the prominence of pre-trained language models since the advent of BERT. A feature of BERT and models with similar architecture is the objective of masked language modeling, in which part of the input is intentionally masked and the model is trained to predict this piece of masked information. Data augmentation is a data-driven technique widely used in machine learning, including research areas like computer vision and natural language processing, to improve model performance by artificially augmenting the training data set by designated techniques. Masked language models (MLM), an essential training feature of BERT, have introduced a novel approach to perform effective pre-training on Transformer based models in natural language processing tasks. Recent studies have utilized masked language model to generate artificially augmented data for NLP downstream tasks. The experimental results show that Mask based data augmentation method provides a simple but efficient approach to improve the model performance. In this paper, we explore and discuss the broader utilization of these data augmentation methods based on MLM.
△ Less
Submitted 16 June, 2023;
originally announced July 2023.
-
Local Environment Poisoning Attacks on Federated Reinforcement Learning
Authors:
Evelyn Ma,
Praneet Rathi,
S. Rasoul Etesami
Abstract:
Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy…
▽ More
Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose a general framework to characterize FRL poisoning as an optimization problem and design a poisoning protocol that can be applied to policy-based FRL. Our framework can also be extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We provably show that our method can strictly hurt the global objective. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Within these experiments, we compare clean and baseline poisoning methods against our proposed framework. The results show that the proposed framework is successful in poisoning FRL systems and reducing performance across various environments and does so more effectively than baseline methods. Our work provides new insights into the vulnerability of FL in RL training and poses new challenges for designing robust FRL algorithms
△ Less
Submitted 4 January, 2024; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale
Authors:
Ryohei Urata,
Hong Liu,
Kevin Yasumura,
Erji Mao,
Jill Berger,
Xiang Zhou,
Cedric Lam,
Roy Bannon,
Darren Hutchinson,
Daniel Nelson,
Leon Poutievski,
Arjun Singh,
Joon Ong,
Amin Vahdat
Abstract:
In this paper, we describe Apollo, to the best of our knowledge, the world's first large-scale production deployment of optical circuit switches (OCSes) for datacenter networking. We will first describe the infrastructure challenges and use cases that motivated optical switching inside datacenters. We then delve into the requirements of OCSes for datacenter applications: balancing cost, port count…
▽ More
In this paper, we describe Apollo, to the best of our knowledge, the world's first large-scale production deployment of optical circuit switches (OCSes) for datacenter networking. We will first describe the infrastructure challenges and use cases that motivated optical switching inside datacenters. We then delve into the requirements of OCSes for datacenter applications: balancing cost, port count, switching time, and optical performance, which drive design choices and implementation details of our internally developed 3D MEMS-based OCS. To enable the Apollo optical switching layer, we employ circulators to realize bidirectional links through the OCS, effectively doubling the OCS radix. The OCS and circulator design choices were critical for meeting network bandwidth, scale, and cost targets. We review the critical co-design of WDM transceiver technology for these OCS plus circulator-based bidirectional links and their corresponding physical impairments, delivered over four generations/speeds of optical interconnect. Finally, we conclude with thoughts on future directions in hardware development and associated applications.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions
Authors:
Uichin Lee,
Gyuwon Jung,
Eun-Yeol Ma,
** San Kim,
Heepyung Kim,
Jumabek Alikhanov,
Youngtae Noh,
Heeyoung Kim
Abstract:
With the advent of Digital Therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy…
▽ More
With the advent of Digital Therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy, a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis. In this work, the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets, to investigate contextual patterns associated with DTx usage, and to establish the (causal) relationship of DTx engagement and behavioral adherence. This review of the key components of data-driven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets, which helps to iteratively improve the receptivity of existing DTx.
△ Less
Submitted 18 September, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Molecular-scale Integration of Multi-modal Sensing and Neuromorphic Computing with Organic Electrochemical Transistors
Authors:
Shijie Wang,
Xi Chen,
Chao Zhao,
Yuxin Kong,
Baojun Lin,
Yongyi Wu,
Zhaozhao Bi,
Ziyi Xuan,
Tao Li,
Yuxiang Li,
Wei Zhang,
En Ma,
Zhongrui Wang,
Wei Ma
Abstract:
Abstract: Bionic learning with fused sensing, memory and processing functions outperforms artificial neural networks running on silicon chips in terms of efficiency and footprint. However, digital hardware implementation of bionic learning suffers from device heterogeneity in sensors and processing cores, which incurs large hardware, energy and time overheads. Here, we present a universal solution…
▽ More
Abstract: Bionic learning with fused sensing, memory and processing functions outperforms artificial neural networks running on silicon chips in terms of efficiency and footprint. However, digital hardware implementation of bionic learning suffers from device heterogeneity in sensors and processing cores, which incurs large hardware, energy and time overheads. Here, we present a universal solution to simultaneously perform multi-modal sensing, memory and processing using organic electrochemical transistors with designed architecture and tailored channel morphology, selective ion injection into the crystalline/amorphous regions. The resultant device work as either a volatile receptor that shows multi-modal sensing, or a non-volatile synapse that features record-high 10-bit analog states, low switching stochasticity and good retention without the integration of any extra devices. Homogeneous integration of such devices enables bionic learning functions such as conditioned reflex and real-time cardiac disease diagnose via reservoir computing, illustrating the promise for future smart edge health informatics.
△ Less
Submitted 19 February, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization
Authors:
Xiaojun Xu,
Jacky Yibo Zhang,
Evelyn Ma,
Danny Son,
Oluwasanmi Koyejo,
Bo Li
Abstract:
Machine learning (ML) robustness and domain generalization are fundamentally correlated: they essentially concern data distribution shifts under adversarial and natural settings, respectively. On one hand, recent studies show that more robust (adversarially trained) models are more generalizable. On the other hand, there is a lack of theoretical understanding of their fundamental connections. In t…
▽ More
Machine learning (ML) robustness and domain generalization are fundamentally correlated: they essentially concern data distribution shifts under adversarial and natural settings, respectively. On one hand, recent studies show that more robust (adversarially trained) models are more generalizable. On the other hand, there is a lack of theoretical understanding of their fundamental connections. In this paper, we explore the relationship between regularization and domain transferability considering different factors such as norm regularization and data augmentations (DA). We propose a general theoretical framework proving that factors involving the model function class regularization are sufficient conditions for relative domain transferability. Our analysis implies that ``robustness" is neither necessary nor sufficient for transferability; rather, regularization is a more fundamental perspective for understanding domain transferability. We then discuss popular DA protocols (including adversarial training) and show when they can be viewed as the function class regularization under certain conditions and therefore improve generalization. We conduct extensive experiments to verify our theoretical findings and show several counterexamples where robustness and generalization are negatively correlated on different datasets.
△ Less
Submitted 23 June, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Prospects for Improving Password Selection
Authors:
Eryn Ma,
Summer Hasama,
Eshaan Lumba,
Eleanor Birrell
Abstract:
User-chosen passwords remain essential to online security, and yet people continue to choose weak, insecure passwords. In this work, we investigate whether prospect theory, a behavioral model of how people evaluate risk, can provide insights into how users choose passwords and whether it can motivate new designs for password selection mechanisms that will nudge users to select stronger passwords.…
▽ More
User-chosen passwords remain essential to online security, and yet people continue to choose weak, insecure passwords. In this work, we investigate whether prospect theory, a behavioral model of how people evaluate risk, can provide insights into how users choose passwords and whether it can motivate new designs for password selection mechanisms that will nudge users to select stronger passwords. We ran a user study with 762 participants, and we found that an intervention guided by prospect theory -- which leverages the reference-dependence effect by framing selecting weak passwords as a loss relative to choosing a stronger password -- causes approximately 25% of users to improve the strength of their password (significantly more than alternative interventions) and reduced the final number of weak passwords by approximately 25%. We also evaluate the relation between user behavior and users' mental models of hacking and password attacks. These results provide guidance for designing and implementing account registration mechanisms that will significantly improve the strength of user-selected passwords, thereby leveraging insights from prospect theory to improve the security of systems that use password-based authentication.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Touchalytics: On the Applicability of Touchscreen Input as a Behavioral Biometric for Continuous Authentication
Authors:
Mario Frank,
Ralf Biedert,
Eugene Ma,
Ivan Martinovic,
Dawn Song
Abstract:
We investigate whether a classifier can continuously authenticate users based on the way they interact with the touchscreen of a smart phone. We propose a set of 30 behavioral touch features that can be extracted from raw touchscreen logs and demonstrate that different users populate distinct subspaces of this feature space. In a systematic experiment designed to test how this behavioral pattern e…
▽ More
We investigate whether a classifier can continuously authenticate users based on the way they interact with the touchscreen of a smart phone. We propose a set of 30 behavioral touch features that can be extracted from raw touchscreen logs and demonstrate that different users populate distinct subspaces of this feature space. In a systematic experiment designed to test how this behavioral pattern exhibits consistency over time, we collected touch data from users interacting with a smart phone using basic navigation maneuvers, i.e., up-down and left-right scrolling. We propose a classification framework that learns the touch behavior of a user during an enrollment phase and is able to accept or reject the current user by monitoring interaction with the touch screen. The classifier achieves a median equal error rate of 0% for intra-session authentication, 2%-3% for inter-session authentication and below 4% when the authentication test was carried out one week after the enrollment phase. While our experimental findings disqualify this method as a standalone authentication mechanism for long-term authentication, it could be implemented as a means to extend screen-lock time or as a part of a multi-modal biometric authentication system.
△ Less
Submitted 8 October, 2012; v1 submitted 26 July, 2012;
originally announced July 2012.