-
Orchestrating LLMs with Different Personalizations
Authors:
** Peng Zhou,
Katie Z Luo,
**gwen Gu,
Jason Yuan,
Kilian Q. Weinberger,
Wen Sun
Abstract:
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. St…
▽ More
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show that our method matches or surpasses existing preference merging techniques, providing a scalable, efficient alternative to fine-tuning LLMs for individual personalization.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
DiffuBox: Refining 3D Object Detection with Point Diffusion
Authors:
Xiangyu Chen,
Zhenzhen Liu,
Katie Z Luo,
Siddhartha Datta,
Adhitya Polavaram,
Yan Wang,
Yurong You,
Boyi Li,
Marco Pavone,
Wei-Lun Chao,
Mark Campbell,
Bharath Hariharan,
Kilian Q. Weinberger
Abstract:
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-…
▽ More
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Attention to Quantum Complexity
Authors:
Hye** Kim,
Yiqing Zhou,
Yichen Xu,
Kaarthik Varma,
Amir H. Karamlou,
Ilan T. Rosen,
Jesse C. Hoke,
Chao Wan,
** Peng Zhou,
William D. Oliver,
Yuri D. Lensky,
Kilian Q. Weinberger,
Eun-Ah Kim
Abstract:
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by la…
▽ More
The imminent era of error-corrected quantum computing urgently demands robust methods to characterize complex quantum states, even from limited and noisy measurements. We introduce the Quantum Attention Network (QuAN), a versatile classical AI framework leveraging the power of attention mechanisms specifically tailored to address the unique challenges of learning quantum complexity. Inspired by large language models, QuAN treats measurement snapshots as tokens while respecting their permutation invariance. Combined with a novel parameter-efficient mini-set self-attention block (MSSAB), such data structure enables QuAN to access high-order moments of the bit-string distribution and preferentially attend to less noisy snapshots. We rigorously test QuAN across three distinct quantum simulation settings: driven hard-core Bose-Hubbard model, random quantum circuits, and the toric code under coherent and incoherent noise. QuAN directly learns the growth in entanglement and state complexity from experimentally obtained computational basis measurements. In particular, it learns the growth in complexity of random circuit data upon increasing depth from noisy experimental data. Taken to a regime inaccessible by existing theory, QuAN unveils the complete phase diagram for noisy toric code data as a function of both noise types. This breakthrough highlights the transformative potential of using purposefully designed AI-driven solutions to assist quantum hardware.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Dynamic Rate Splitting Grou** for Antifragile Responses to Wireless Network Disruptions
Authors:
Kevin Weinberger,
Aydin Sezgin
Abstract:
The reliance on wireless network architectures for applications demanding high reliability and fault tolerance is growing. These architectures heavily depend on wireless channels, making them susceptible to impairments and blockages. Ensuring functionality, particularly for safety-critical applications, demands robust countermeasures at the physical layer. In response, this work proposes the utili…
▽ More
The reliance on wireless network architectures for applications demanding high reliability and fault tolerance is growing. These architectures heavily depend on wireless channels, making them susceptible to impairments and blockages. Ensuring functionality, particularly for safety-critical applications, demands robust countermeasures at the physical layer. In response, this work proposes the utilization of a dynamic Rate Splitting (RS) grou** approach as a resilience mechanism during blockages. RS effectively manages interference within networks but faces challenges during outages and blockages, where system performance can deteriorate due to the lowest decoding rate dictating the common rate and increased interference from fewer available channel links. As a strategic countermeasure, RS is leveraged to mitigate the impact of blockages, maintaining system efficiency and performance amidst disruptions. In fact, the introduction of new RS groups enables the exploration of novel solutions to the resource allocation problem, potentially outperforming those adopted before the occurrence of a blockage. As it turns out, by employing the dynamic RS grou**, the network exhibits an antifragile recovery response, showcasing the network's ability to not only recover from disruptions but also surpass its initial performance.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Show Me the Way: Real-Time Tracking of Wireless Mobile Users with UWB-Enabled RIS
Authors:
Kevin Weinberger,
Simon Tewes,
Aydin Sezgin
Abstract:
The integration of Reconfigurable Intelligent Surfaces (RIS) in 6G wireless networks offers unprecedented control over communication environments. However, identifying optimal configurations within practical constraints remains a significant challenge. This becomes especially pronounced, when the user is mobile and the configurations need to be deployed in real time. Leveraging Ultra-Wideband (UWB…
▽ More
The integration of Reconfigurable Intelligent Surfaces (RIS) in 6G wireless networks offers unprecedented control over communication environments. However, identifying optimal configurations within practical constraints remains a significant challenge. This becomes especially pronounced, when the user is mobile and the configurations need to be deployed in real time. Leveraging Ultra-Wideband (UWB) as localization technique, we capture and analyze real-time movements of a user within the RIS-enabled indoor environment. Given this information about the system's geometry, a model-based optimization is utilized, which enables real-time beam steering of the RIS towards the user. However, practical limitations of UWB modules lead to fluctuating UWB estimates, causing the RIS beam to occasionally miss the tracked user. The methodologies proposed in this work aim to increase the compatibility between these two systems. To this end, we provide two key solutions: beam splitting for obtaining more robust RIS configurations and UWB estimation correction for reducing the variations in the UWB data. Through comprehensive theoretical and experimental evaluations in both stationary and mobile scenarios, the effectiveness of the proposed techniques is demonstrated. When combined, the proposed methods improve worst-case tracking performance by a significant 17.5dB compared to the conventional approach.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Better Monocular 3D Detectors with LiDAR from the Past
Authors:
Yurong You,
Cheng Perng Phoo,
Carlos Andres Diaz-Ruiz,
Katie Z Luo,
Wei-Lun Chao,
Mark Campbell,
Bharath Hariharan,
Kilian Q Weinberger
Abstract:
Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In th…
▽ More
Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Specifically, at inference time, we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and significant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
△ Less
Submitted 9 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
Authors:
** Peng Zhou,
Charles Staats,
Wenda Li,
Christian Szegedy,
Kilian Q. Weinberger,
Yuhuai Wu
Abstract:
Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal m…
▽ More
Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal mathematics (e.g. in Isabelle, a formal theorem proving environment), they can be prompted to translate i.e. autoformalize informal mathematical statements into formal Isabelle code -- which can be verified automatically for internal consistency. This provides a mechanism to automatically reject solutions whose formalized versions are inconsistent within themselves or with the formalized problem statement. We evaluate our method on GSM8K, MATH and MultiArith datasets and demonstrate that our approach provides a consistently better heuristic than vanilla majority voting -- the previously best method to identify correct answers, by more than 12% on GSM8K. In our experiments it improves results consistently across all datasets and LLM model sizes. The code can be found at https://github.com/**pz/dtv.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Online Feature Updates Improve Online (Generalized) Label Shift Adaptation
Authors:
Ruihan Wu,
Siddhartha Datta,
Yi Su,
Dheeraj Baby,
Yu-Xiang Wang,
Kilian Q. Weinberger
Abstract:
This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. O…
▽ More
This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. Our novel method, Online Label Shift adaptation with Online Feature Updates (OLS-OFU), leverages self-supervised learning to refine the feature extraction process, thereby improving the prediction model. Theoretical analyses confirm that OLS-OFU reduces algorithmic regret by capitalizing on self-supervised learning for feature refinement. Empirical studies on various datasets, under both online label shift and generalized label shift conditions, underscore the effectiveness and robustness of OLS-OFU, especially in cases of domain shifts.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Zero-shot Object-Level OOD Detection with Context-Aware Inpainting
Authors:
Quang-Huy Nguyen,
** Peng Zhou,
Zhenzhen Liu,
Khanh-Huyen Bui,
Kilian Q. Weinberger,
Dung D. Le
Abstract:
Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses…
▽ More
Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses an off-the-shelf diffusion model to replace detected objects with inpainting. RONIN conditions the inpainting process with the predicted ID label, drawing the input object closer to the in-distribution domain. As a result, the reconstructed object is very close to the original in the ID cases and far in the OOD cases, allowing RONIN to effectively distinguish ID and OOD samples. Throughout extensive experiments, we demonstrate that RONIN achieves competitive results compared to previous approaches across several datasets, both in zero-shot and non-zero-shot settings.
△ Less
Submitted 6 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Validating Properties of RIS Channel Models with Prototypical Measurements
Authors:
Kevin Weinberger,
Simon Tewes,
Aydin Sezgin
Abstract:
The integration of Reconfigurable Intelligent Surfaces (RIS) holds substantial promise for revolutionizing 6G wireless networks, offering unprecedented capabilities for real-time control over communication environments. However, determining optimal RIS configurations remains a pivotal challenge, necessitating the development of accurate analytical models. While theoretically derived models provide…
▽ More
The integration of Reconfigurable Intelligent Surfaces (RIS) holds substantial promise for revolutionizing 6G wireless networks, offering unprecedented capabilities for real-time control over communication environments. However, determining optimal RIS configurations remains a pivotal challenge, necessitating the development of accurate analytical models. While theoretically derived models provide valuable insights, their potentially idealistic assumptions do not always translate well to practical measurements. This becomes especially problematic in mobile environments, where signals arrive from various directions. This study deploys an RIS prototype on a turntable, capturing the RIS channels' dependency on the angle of incoming signals. The difference between theory and practice is bridged by refining a model with angle-dependent reflection coefficients. The improved model exhibits a significantly closer alignment with real-world measurements. Analysis of the reflect coefficients reveals that non-perpendicular receiver angles can induce an additional attenuation of up to -14.5dB. Additionally, we note significant phase shift deviations, varying for each reflect element.
△ Less
Submitted 13 October, 2023;
originally announced February 2024.
-
Denoising Vision Transformers
Authors:
Jiawei Yang,
Katie Z Luo,
Jiefeng Li,
Kilian Q Weinberger,
Yonglong Tian,
Yue Wang
Abstract:
We delve into a nuanced but significant challenge inherent to Vision Transformers (ViTs): feature maps of these models exhibit grid-like artifacts, which detrimentally hurt the performance of ViTs in downstream tasks. Our investigations trace this fundamental issue down to the positional embeddings at the input stage. To address this, we propose a novel noise model, which is universally applicable…
▽ More
We delve into a nuanced but significant challenge inherent to Vision Transformers (ViTs): feature maps of these models exhibit grid-like artifacts, which detrimentally hurt the performance of ViTs in downstream tasks. Our investigations trace this fundamental issue down to the positional embeddings at the input stage. To address this, we propose a novel noise model, which is universally applicable to all ViTs. Specifically, the noise model dissects ViT outputs into three components: a semantics term free from noise artifacts and two artifact-related terms that are conditioned on pixel locations. Such a decomposition is achieved by enforcing cross-view feature consistency with neural fields in a per-image basis. This per-image optimization process extracts artifact-free features from raw ViT outputs, providing clean features for offline applications. Expanding the scope of our solution to support online functionality, we introduce a learnable denoiser to predict artifact-free features directly from unprocessed ViT outputs, which shows remarkable generalization capabilities to novel data without the need for per-image optimization. Our two-stage approach, termed Denoising Vision Transformers (DVT), does not require re-training existing pre-trained ViTs and is immediately applicable to any Transformer-based architecture. We evaluate our method on a variety of representative ViTs (DINO, MAE, DeiT-III, EVA02, CLIP, DINOv2, DINOv2-reg). Extensive evaluations demonstrate that our DVT consistently and significantly improves existing state-of-the-art general-purpose models in semantic and geometric tasks across multiple datasets (e.g., +3.84 mIoU). We hope our study will encourage a re-evaluation of ViT design, especially regarding the naive use of positional embeddings.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps
Authors:
Katie Z Luo,
Xinshuo Weng,
Yan Wang,
Shuang Wu,
Jie Li,
Kilian Q Weinberger,
Yue Wang,
Marco Pavone
Abstract:
Autonomous driving has traditionally relied heavily on costly and labor-intensive High Definition (HD) maps, hindering scalability. In contrast, Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative. In this work, we systematically explore the effect of SD maps for real-time lane-topology understanding. We propose a novel framework to integr…
▽ More
Autonomous driving has traditionally relied heavily on costly and labor-intensive High Definition (HD) maps, hindering scalability. In contrast, Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative. In this work, we systematically explore the effect of SD maps for real-time lane-topology understanding. We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Encoder Representations from transFormers, to leverage priors in SD maps for the lane-topology prediction task. This enhancement consistently and significantly boosts (by up to 60%) lane detection and topology prediction on current state-of-the-art online map prediction methods without bells and whistles and can be immediately incorporated into any Transformer-based lane-topology method. Code is available at https://github.com/NVlabs/SMERF.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Authors:
Katie Z Luo,
Zhenzhen Liu,
Xiangyu Chen,
Yurong You,
Sagie Benaim,
Cheng Perng Phoo,
Mark Campbell,
Wen Sun,
Bharath Hariharan,
Kilian Q. Weinberger
Abstract:
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper,…
▽ More
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.
△ Less
Submitted 5 November, 2023; v1 submitted 29 October, 2023;
originally announced October 2023.
-
Correction with Backtracking Reduces Hallucination in Summarization
Authors:
Zhenzhen Liu,
Chao Wan,
Varsha Kishore,
** Peng Zhou,
Minmin Chen,
Kilian Q. Weinberger
Abstract:
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we intr…
▽ More
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
△ Less
Submitted 31 October, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Pre-Training LiDAR-Based 3D Object Detectors Through Colorization
Authors:
Tai-Yu Pan,
Chenyang Ma,
Tianle Chen,
Cheng Perng Phoo,
Katie Z Luo,
Yurong You,
Mark Campbell,
Kilian Q. Weinberger,
Bharath Hariharan,
Wei-Lun Chao
Abstract:
Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equip** it with valuable semantic cues. To…
▽ More
Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equip** it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles.
△ Less
Submitted 25 February, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features
Authors:
Travis Zhang,
Katie Luo,
Cheng Perng Phoo,
Yurong You,
Wei-Lun Chao,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detector…
▽ More
The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at https://github.com/zhangtravis/Hist-DA.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
On the Effectiveness of Offline RL for Dialogue Response Generation
Authors:
Paloma Sodhi,
Felix Wu,
Ethan R. Elenberg,
Kilian Q. Weinberger,
Ryan McDonald
Abstract:
A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a…
▽ More
A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
IncDSI: Incrementally Updatable Document Retrieval
Authors:
Varsha Kishore,
Chao Wan,
Justin Lovelace,
Yoav Artzi,
Kilian Q. Weinberger
Abstract:
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not…
▽ More
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Learning Iterative Neural Optimizers for Image Steganography
Authors:
Xiangyu Chen,
Varsha Kishore,
Kilian Q Weinberger
Abstract:
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In con…
▽ More
Image steganography is the process of concealing secret information in images through imperceptible changes. Recent work has formulated this task as a classic constrained optimization problem. In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose an iterative neural network trained to perform the optimization steps. In contrast to classical optimization methods like L-BFGS or projected gradient descent, we train the neural network to also stay close to the manifold of natural images throughout the optimization. We show that our learned neural optimization is faster and more reliable than classical optimization approaches. In comparison to previous state-of-the-art encoder-decoder-based steganography methods, it reduces the recovery error rate by multiple orders of magnitude and achieves zero error up to 3 bits per pixel (bpp) without the need for error-correcting codes.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Unsupervised Adaptation from Repeated Traversals for Autonomous Driving
Authors:
Yurong You,
Cheng Perng Phoo,
Katie Z Luo,
Travis Zhang,
Wei-Lun Chao,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. Whi…
▽ More
For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. While extensive research has been done on such an unsupervised domain adaptation problem, one fundamental problem lingers: there is no reliable signal in the target domain to supervise the adaptation process. To overcome this issue we observe that it is easy to collect unsupervised data from multiple traversals of repeated routes. While different from conventional unsupervised domain adaptation, this assumption is extremely realistic since many drivers share the same roads. We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain. Concretely, we generate pseudo-labels with the out-of-domain detector but reduce false positives by removing detections of supposedly mobile objects that are persistent across traversals. Further, we reduce false negatives by encouraging predictions in regions that are not persistent. We experiment with our approach on two large-scale driving datasets and show remarkable improvement in 3D object detection of cars, pedestrians, and cyclists, bringing us a step closer to generalizable autonomous driving.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Unsupervised Out-of-Distribution Detection with Diffusion Inpainting
Authors:
Zhenzhen Liu,
** Peng Zhou,
Yufan Wang,
Kilian Q. Weinberger
Abstract:
Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task - Lift, Map, Detect (LMD) - that leverages recent advancement in diffusion models. Diffusion models are one type of generative models. At their core, they learn an iterative denoising process that gradually maps a noisy imag…
▽ More
Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task - Lift, Map, Detect (LMD) - that leverages recent advancement in diffusion models. Diffusion models are one type of generative models. At their core, they learn an iterative denoising process that gradually maps a noisy image closer to their training manifolds. LMD leverages this intuition for OOD detection. Specifically, LMD lifts an image off its original manifold by corrupting it, and maps it towards the in-domain manifold with a diffusion model. For an out-of-domain image, the mapped image would have a large distance away from its original manifold, and LMD would identify it as OOD accordingly. We show through extensive experiments that LMD achieves competitive performance across a broad variety of datasets. Code can be found at https://github.com/zhenzhel/lift_map_detect.
△ Less
Submitted 16 August, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
Authors:
Boyi Li,
Rodolfo Corona,
Karttikeya Mangalam,
Catherine Chen,
Daniel Flaherty,
Serge Belongie,
Kilian Q. Weinberger,
Jitendra Malik,
Trevor Darrell,
Dan Klein
Abstract:
Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger…
▽ More
Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates em-beddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multi-modal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8x reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.
△ Less
Submitted 12 April, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Latent Diffusion for Language Generation
Authors:
Justin Lovelace,
Varsha Kishore,
Chao Wan,
Eliot Shekhtman,
Kilian Q. Weinberger
Abstract:
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that enc…
▽ More
Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders. We then demonstrate that continuous diffusion models can be learned in the latent space of the language autoencoder, enabling us to sample continuous latent representations that can be decoded into natural language with the pretrained decoder. We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models.
△ Less
Submitted 7 November, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
The Perfect Match: RIS-enabled MIMO Channel Estimation Using Tensor Decomposition
Authors:
Bilal Ahmad,
Kevin Weinberger,
Aydin Sezgin,
Bilal Zafar,
Martin Haardt
Abstract:
The deployment of reconfigurable intelligent surfaces (RISs) in a communication system provides control over the propagation environment, which facilitates the augmentation of a multitude of communication objectives. As these performance gains are highly dependent on the applied phase shifts at the RIS, accurate channel state information at the transceivers is imperative. However, not only do RISs…
▽ More
The deployment of reconfigurable intelligent surfaces (RISs) in a communication system provides control over the propagation environment, which facilitates the augmentation of a multitude of communication objectives. As these performance gains are highly dependent on the applied phase shifts at the RIS, accurate channel state information at the transceivers is imperative. However, not only do RISs traditionally lack signal processing capabilities, but their end-to-end channels also consist of multiple components. Hence, conventional channel estimation (CE) algorithms become incompatible with RIS-aided communication systems as they fail to provide the necessary information about the channel components, which are essential for a beneficial RIS configuration. To enable the full potential of RISs, we propose to use tensor-decomposition-based CE, which facilitates smart configuration of the RIS by providing the required channel components. We use canonical polyadic (CP) decomposition, that exploits a structured time domain pilot sequence. Compared to other state-of-the-art decomposition methods, the proposed Semi-Algebraic CP decomposition via Simultaneous Matrix Diagonalization (SECSI) algorithm is more time efficient as it does not require an iterative process. The benefits of SECSI for RIS-aided networks are validated with numerical results, which show the improved individual and end-to-end CE accuracy of SECSI.
△ Less
Submitted 2 May, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
RIS-enhanced Resilience in Cell-Free MIMO
Authors:
Kevin Weinberger,
Robert-Jeron Reifert,
Aydin Sezgin,
Ertugrul Basar
Abstract:
More and more applications that require high reliability and fault tolerance are realized with wireless network architectures and thus ultimately rely on the wireless channels, which can be subject to impairments and blockages. Hence, these architectures require a backup plan in the physical layer in order to guarantee functionality, especially when safety-relevant aspects are involved. To this en…
▽ More
More and more applications that require high reliability and fault tolerance are realized with wireless network architectures and thus ultimately rely on the wireless channels, which can be subject to impairments and blockages. Hence, these architectures require a backup plan in the physical layer in order to guarantee functionality, especially when safety-relevant aspects are involved. To this end, this work proposes to utilize the reconfigurable intelligent surface (RIS) as a resilience mechanism to counteract outages. The advantages of RISs for such a purpose derive from their inherent addition of alternative channel links in combination with their reconfigurability. The major benefits are investigated in a cell-free multiple-input and multiple-output (MIMO) setting, in which the direct channel paths are subject to blockages. An optimization problem is formulated that includes rate allocation with beamforming and phase shift configuration and is solved with a resilience-aware alternating optimization approach. Numerical results show that deploying even a randomly-configured RIS to a network reduces the performance degradation caused by blockages. This becomes even more pronounced in the optimized case, in which the RIS is able to potentially counteract the performance degradation entirely. Interestingly, adding more reflecting elements to the system brings an overall benefit for the resilience, even for time-sensitive systems, due to the contribution of the RIS reflections, even when unoptimized.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Learning to Invert: Simple Adaptive Attacks for Gradient Inversion in Federated Learning
Authors:
Ruihan Wu,
Xiangyu Chen,
Chuan Guo,
Kilian Q. Weinberger
Abstract:
Gradient inversion attack enables recovery of training samples from model gradients in federated learning (FL), and constitutes a serious threat to data privacy. To mitigate this vulnerability, prior work proposed both principled defenses based on differential privacy, as well as heuristic defenses based on gradient compression as countermeasures. These defenses have so far been very effective, in…
▽ More
Gradient inversion attack enables recovery of training samples from model gradients in federated learning (FL), and constitutes a serious threat to data privacy. To mitigate this vulnerability, prior work proposed both principled defenses based on differential privacy, as well as heuristic defenses based on gradient compression as countermeasures. These defenses have so far been very effective, in particular those based on gradient compression that allow the model to maintain high accuracy while greatly reducing the effectiveness of attacks. In this work, we argue that such findings underestimate the privacy risk in FL. As a counterexample, we show that existing defenses can be broken by a simple adaptive attack, where a model trained on auxiliary data is able to invert gradients on both vision and language tasks.
△ Less
Submitted 9 June, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Image-to-Image Translation for Autonomous Driving from Coarsely-Aligned Image Pairs
Authors:
Youya Xia,
Josephine Monica,
Wei-Lun Chao,
Bharath Hariharan,
Kilian Q Weinberger,
Mark Campbell
Abstract:
A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired imag…
▽ More
A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired image-to-image translation problem due to the lack of paired images captured under the exact same camera poses and semantic layouts. While perfectly-aligned images are not available, one can easily obtain coarsely-paired images. For instance, many people drive the same routes daily in both good and adverse weather; thus, images captured at close-by GPS locations can form a pair. Though data from repeated traversals are unlikely to capture the same foreground objects, we posit that they provide rich contextual information to supervise the image translation model. To this end, we propose a novel training objective leveraging coarsely-aligned image pairs. We show that our coarsely-aligned training scheme leads to a better image translation quality and improved downstream tasks, such as semantic segmentation, monocular depth estimation, and visual localization.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
Authors:
Carlos A. Diaz-Ruiz,
Youya Xia,
Yurong You,
Jose Nino,
Junan Chen,
Josephine Monica,
Xiangyu Chen,
Katie Luo,
Yan Wang,
Marc Emond,
Wei-Lun Chao,
Bharath Hariharan,
Kilian Q. Weinberger,
Mark Campbell
Abstract:
Advances in perception for self-driving cars have accelerated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety requirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new data…
▽ More
Advances in perception for self-driving cars have accelerated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety requirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new dataset to enable robust autonomous driving via a novel data collection process - data is repeatedly recorded along a 15 km route under diverse scene (urban, highway, rural, campus), weather (snow, rain, sun), time (day/night), and traffic conditions (pedestrians, cyclists and cars). The dataset includes images and point clouds from cameras and LiDAR sensors, along with high-precision GPS/INS to establish correspondence across routes. The dataset includes road and object annotations using amodal masks to capture partial occlusions and 3D bounding boxes. We demonstrate the uniqueness of this dataset by analyzing the performance of baselines in amodal segmentation of road and objects, depth estimation, and 3D object detection. The repeated routes opens new research directions in object discovery, continual learning, and anomaly detection. Link to Ithaca365: https://ithaca365.mae.cornell.edu/
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Differentially Private Multi-Party Data Release for Linear Regression
Authors:
Ruihan Wu,
Xin Yang,
Yuanshun Yao,
Jiankai Sun,
Tianyi Liu,
Kilian Q. Weinberger,
Chong Wang
Abstract:
Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects. However the majority of prior work has focused on scenarios where a single party owns all the data. In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects. Withi…
▽ More
Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects. However the majority of prior work has focused on scenarios where a single party owns all the data. In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects. Within the context of linear regression that allow all parties to train models on the complete data without the ability to infer private attributes or identities of individuals, we start with directly applying Gaussian mechanism and show it has the small eigenvalue problem. We further propose our novel method and prove it asymptotically converges to the optimal (non-private) solutions with increasing dataset size. We substantiate the theoretical results through experiments on both artificial and real-world datasets.
△ Less
Submitted 18 June, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Long-term Control for Dialogue Generation: Methods and Evaluation
Authors:
Ramya Ramakrishnan,
Hashan Buddhika Narangodage,
Mauro Schilman,
Kilian Q. Weinberger,
Ryan McDonald
Abstract:
Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of t…
▽ More
Current approaches for controlling dialogue response generation are primarily focused on high-level attributes like style, sentiment, or topic. In this work, we focus on constrained long-term dialogue generation, which involves more fine-grained control and requires a given set of control words to appear in generated responses. This setting requires a model to not only consider the generation of these control words in the immediate context, but also produce utterances that will encourage the generation of the words at some time in the (possibly distant) future. We define the problem of constrained long-term control for dialogue generation, identify gaps in current methods for evaluation, and propose new metrics that better measure long-term control. We also propose a retrieval-augmented method that improves performance of long-term controlled generation via logit modification techniques. We show through experiments on three task-oriented dialogue datasets that our metrics better assess dialogue control relative to current alternatives and that our method outperforms state-of-the-art constrained generation baselines.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Authors:
Felix Wu,
Kwangyoun Kim,
Shinji Watanabe,
Kyu Han,
Ryan McDonald,
Kilian Q. Weinberger,
Yoav Artzi
Abstract:
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task -- transcribing audio inputs into pseudo subword sequences. This process stands on its own, or can be applied as low-cost second-stage pre-training…
▽ More
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task -- transcribing audio inputs into pseudo subword sequences. This process stands on its own, or can be applied as low-cost second-stage pre-training. We experiment with automatic speech recognition (ASR), spoken named entity recognition, and speech-to-text translation. We set new state-of-the-art results for end-to-end spoken named entity recognition, and show consistent improvements on 20 language pairs for speech-to-text translation, even when competing methods use additional text data for training. Finally, on ASR, our approach enables encoder-decoder methods to benefit from pre-training for all parts of the network, and shows comparable performance to highly optimized recent methods.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Learning to Detect Mobile Objects from LiDAR Scans Without Labels
Authors:
Yurong You,
Katie Z Luo,
Cheng Perng Phoo,
Wei-Lun Chao,
Wen Sun,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth.…
▽ More
Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. Our approach leverages several simple common sense heuristics to create an initial set of approximate seed labels. For example, relevant traffic participants are generally not persistent across multiple traversals of the same route, do not fly, and are never under ground. We demonstrate that these seed labels are highly effective to bootstrap a surprisingly accurate detector through repeated self-training without a single human annotated label.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Sacrificing CSI for a Greater Good: RIS-enabled Opportunistic Rate Splitting
Authors:
Kevin Weinberger,
Aydin Sezgin
Abstract:
In reconfigurable intelligent surface (RIS)-assisted systems, the optimization of the phase shifts requires separate acquisition of the channel state information (CSI) for the direct and RIS-assisted channels, posing significant design challenges. In this paper, a novel scheme is proposed, which considers practical limitations like pilot overhead and channel estimation (CE) errors to increase the…
▽ More
In reconfigurable intelligent surface (RIS)-assisted systems, the optimization of the phase shifts requires separate acquisition of the channel state information (CSI) for the direct and RIS-assisted channels, posing significant design challenges. In this paper, a novel scheme is proposed, which considers practical limitations like pilot overhead and channel estimation (CE) errors to increase the net performance. More specifically, at the cost of unpredictable interference, a portion of the CSI for the RIS-assisted channels is sacrificed in order to reduce the CE time. By alternating the CSI between coherence blocks and employing rate splitting, it becomes possible to mitigate the interference, thereby compensating the adverse effect of the sacrificed CSI. Numerical simulations validate that the proposed scheme exhibits better performance in terms of achievable net rate, resulting in gains of up to 160% compared non-orthogonal multiple access (NOMA), when CE time and CE errors are considered.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception
Authors:
Yurong You,
Katie Z Luo,
Xiangyu Chen,
Junan Chen,
Wei-Lun Chao,
Wen Sun,
Bharath Hariharan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
Self-driving cars must detect vehicles, pedestrians, and other traffic participants accurately to operate safely. Small, far-away, or highly occluded objects are particularly challenging because there is limited information in the LiDAR point clouds for detecting them. To address this challenge, we leverage valuable information from the past: in particular, data collected in past traversals of the…
▽ More
Self-driving cars must detect vehicles, pedestrians, and other traffic participants accurately to operate safely. Small, far-away, or highly occluded objects are particularly challenging because there is limited information in the LiDAR point clouds for detecting them. To address this challenge, we leverage valuable information from the past: in particular, data collected in past traversals of the same scene. We posit that these past data, which are typically discarded, provide rich contextual information for disambiguating the above-mentioned challenging cases. To this end, we propose a novel, end-to-end trainable Hindsight framework to extract this contextual information from past traversals and store it in an easy-to-query data structure, which can then be leveraged to aid future 3D object detection of the same scene. We show that this framework is compatible with most modern 3D detection architectures and can substantially improve their average precision on multiple autonomous driving datasets, most notably by more than 300% on the challenging cases.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Does Label Differential Privacy Prevent Label Inference Attacks?
Authors:
Ruihan Wu,
** Peng Zhou,
Kilian Q. Weinberger,
Chuan Guo
Abstract:
Label differential privacy (label-DP) is a popular framework for training private ML models on datasets with public features and sensitive private labels. Despite its rigorous privacy guarantee, it has been observed that in practice label-DP does not preclude label inference attacks (LIAs): Models trained with label-DP can be evaluated on the public training features to recover, with high accuracy…
▽ More
Label differential privacy (label-DP) is a popular framework for training private ML models on datasets with public features and sensitive private labels. Despite its rigorous privacy guarantee, it has been observed that in practice label-DP does not preclude label inference attacks (LIAs): Models trained with label-DP can be evaluated on the public training features to recover, with high accuracy, the very private labels that it was designed to protect. In this work, we argue that this phenomenon is not paradoxical and that label-DP is designed to limit the advantage of an LIA adversary compared to predicting training labels using the Bayes classifier. At label-DP $ε=0$ this advantage is zero, hence the optimal attack is to predict according to the Bayes classifier and is independent of the training labels. Our bound shows the semantic protection conferred by label-DP and gives guidelines on how to choose $\varepsilon$ to limit the threat of LIAs below a certain level. Finally, we empirically demonstrate that our result closely captures the behavior of simulated attacks on both synthetic and real world datasets.
△ Less
Submitted 3 June, 2023; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Language-driven Semantic Segmentation
Authors:
Boyi Li,
Kilian Q. Weinberger,
Serge Belongie,
Vladlen Koltun,
René Ranftl
Abstract:
We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding…
▽ More
We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class. The text embeddings provide a flexible label representation in which semantically similar labels map to similar regions in the embedding space (e.g., "cat" and "furry"). This allows LSeg to generalize to previously unseen categories at test time, without retraining or even requiring a single additional training sample. We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods, and even matches the accuracy of traditional segmentation algorithms when a fixed label set is provided. Code and demo are available at https://github.com/isl-org/lang-seg.
△ Less
Submitted 2 April, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Machine learning discovery of new phases in programmable quantum simulator snapshots
Authors:
Cole Miles,
Rhine Samajdar,
Sepehr Ebadi,
Tout T. Wang,
Hannes Pichler,
Subir Sachdev,
Mikhail D. Lukin,
Markus Greiner,
Kilian Q. Weinberger,
Eun-Ah Kim
Abstract:
Machine learning has recently emerged as a promising approach for studying complex phenomena characterized by rich datasets. In particular, data-centric approaches lend to the possibility of automatically discovering structures in experimental datasets that manual inspection may miss. Here, we introduce an interpretable unsupervised-supervised hybrid machine learning approach, the hybrid-correlati…
▽ More
Machine learning has recently emerged as a promising approach for studying complex phenomena characterized by rich datasets. In particular, data-centric approaches lend to the possibility of automatically discovering structures in experimental datasets that manual inspection may miss. Here, we introduce an interpretable unsupervised-supervised hybrid machine learning approach, the hybrid-correlation convolutional neural network (Hybrid-CCNN), and apply it to experimental data generated using a programmable quantum simulator based on Rydberg atom arrays. Specifically, we apply Hybrid-CCNN to analyze new quantum phases on square lattices with programmable interactions. The initial unsupervised dimensionality reduction and clustering stage first reveals five distinct quantum phase regions. In a second supervised stage, we refine these phase boundaries and characterize each phase by training fully interpretable CCNNs and extracting the relevant correlations for each phase. The characteristic spatial weightings and snippets of correlations specifically recognized in each phase capture quantum fluctuations in the striated phase and identify two previously undetected phases, the rhombic and boundary-ordered phases. These observations demonstrate that a combination of programmable quantum simulators with machine learning can be used as a powerful tool for detailed exploration of correlated quantum states of matter.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Is High Variance Unavoidable in RL? A Case Study in Continuous Control
Authors:
Johan Bjorck,
Carla P. Gomes,
Kilian Q. Weinberger
Abstract:
Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an…
▽ More
Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance -- continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor "outlier" runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one particular method is surprisingly effective and simple -- normalizing penultimate features. Addressing the learning instability allows for larger learning rates, and significantly decreases the variance of outcomes. This demonstrates that the perceived variance in RL is not necessarily inherent to the problem definition and may be addressed through simple architectural modifications.
△ Less
Submitted 5 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
Authors:
Felix Wu,
Kwangyoun Kim,
**g Pan,
Kyu Han,
Kilian Q. Weinberger,
Yoav Artzi
Abstract:
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improveme…
▽ More
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Online Adaptation to Label Distribution Shift
Authors:
Ruihan Wu,
Chuan Guo,
Yi Su,
Kilian Q. Weinberger
Abstract:
Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hind…
▽ More
Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.
△ Less
Submitted 5 January, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Towards Deeper Deep Reinforcement Learning with Spectral Normalization
Authors:
Johan Bjorck,
Carla P. Gomes,
Kilian Q. Weinberger
Abstract:
In computer vision and natural language processing, innovations in model architecture that increase model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datas…
▽ More
In computer vision and natural language processing, innovations in model architecture that increase model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on actor-critic algorithms. We empirically verify that naively adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield significant performance improvements -- suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.
△ Less
Submitted 3 January, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Synergistic Benefits in IRS- and RS-enabled C-RAN with Energy-Efficient Clustering
Authors:
Kevin Weinberger,
Alaa Alameer Ahmad,
Aydin Sezgin,
Alessio Zappone
Abstract:
The potential of intelligent reflecting surfaces (IRSs) is investigated as a promising technique for enhancing the energy efficiency of wireless networks. Specifically, the IRS enables passive beamsteering by employing many low-cost individually controllable reflect elements. The resulting change of the channel state, however, increases both, signal quality and interference at the users. To counte…
▽ More
The potential of intelligent reflecting surfaces (IRSs) is investigated as a promising technique for enhancing the energy efficiency of wireless networks. Specifically, the IRS enables passive beamsteering by employing many low-cost individually controllable reflect elements. The resulting change of the channel state, however, increases both, signal quality and interference at the users. To counteract this negative side effect, we employ rate splitting (RS), which inherently is able to mitigate the impact of interference. We facilitate practical implementation by considering a Cloud Radio Access Network (C-RAN) at the cost of finite fronthaul-link capacities, which necessitate the allocation of sensible user-centric clusters to ensure energy-efficient transmissions. Dynamic methods for RS and the user clustering are proposed to account for the interdependencies of the individual techniques. Numerical results show that the dynamic RS method establishes synergistic benefits between RS and the IRS. Additionally, the dynamic user clustering and the IRS cooperate synergistically, with a gain of up to 88% when compared to the static scheme. Interestingly, with an increasing fronthaul capacity, the gain of the dynamic user clustering decreases, while the gain of the dynamic RS method increases. Around the resulting intersection, both methods affect the system concurrently, improving the energy efficiency drastically.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection
Authors:
Yurong You,
Carlos Andres Diaz-Ruiz,
Yan Wang,
Wei-Lun Chao,
Bharath Hariharan,
Mark Campbell,
Kilian Q Weinberger
Abstract:
Self-driving cars must detect other vehicles and pedestrians in 3D to plan safe routes and avoid collisions. State-of-the-art 3D object detectors, based on deep learning, have shown promising accuracy but are prone to over-fit to domain idiosyncrasies, making them fail in new environments -- a serious problem if autonomous vehicles are meant to operate freely. In this paper, we propose a novel lea…
▽ More
Self-driving cars must detect other vehicles and pedestrians in 3D to plan safe routes and avoid collisions. State-of-the-art 3D object detectors, based on deep learning, have shown promising accuracy but are prone to over-fit to domain idiosyncrasies, making them fail in new environments -- a serious problem if autonomous vehicles are meant to operate freely. In this paper, we propose a novel learning approach that drastically reduces this gap by fine-tuning the detector on pseudo-labels in the target domain, which our method generates while the vehicle is parked, based on replays of previously recorded driving sequences. In these replays, objects are tracked over time, and detections are interpolated and extrapolated -- crucially, leveraging future information to catch hard cases. We show, on five autonomous driving datasets, that fine-tuning the object detector on these pseudo-labels substantially reduces the domain gap to new driving environments, yielding drastic improvements in accuracy and detection reliability.
△ Less
Submitted 10 July, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision
Authors:
Johan Bjorck,
Xiangyu Chen,
Christopher De Sa,
Carla P. Gomes,
Kilian Q. Weinberger
Abstract:
Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider conti…
▽ More
Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In contrast, this promising approach has not yet enjoyed similarly widespread adoption within the reinforcement learning (RL) community, partly because RL agents can be notoriously hard to train even in full precision. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a naïve adaptation of low-precision methods from supervised learning fails. We propose a set of six modifications, all straightforward to implement, that leaves the underlying agent and its hyperparameters unchanged but improves the numerical stability dramatically. The resulting modified SAC agent has lower memory and compute requirements while matching full-precision rewards, demonstrating that low-precision training can substantially accelerate state-of-the-art RL without parameter tuning.
△ Less
Submitted 3 June, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.
-
Assessing the causal effects of a stochastic intervention in time series data: Are heat alerts effective in preventing deaths and hospitalizations?
Authors:
Xiao Wu,
Kate R. Weinberger,
Gregory A. Wellenius,
Francesca Dominici,
Danielle Braun
Abstract:
The methodological development of this paper is motivated by the need to address the following scientific question: does the issuance of heat alerts prevent adverse health effects? Our goal is to address this question within a causal inference framework in the context of time series data. A key challenge is that causal inference methods require the overlap assumption to hold: each unit (i.e., a da…
▽ More
The methodological development of this paper is motivated by the need to address the following scientific question: does the issuance of heat alerts prevent adverse health effects? Our goal is to address this question within a causal inference framework in the context of time series data. A key challenge is that causal inference methods require the overlap assumption to hold: each unit (i.e., a day) must have a positive probability of receiving the treatment (i.e., issuing a heat alert on that day). In our motivating example, the overlap assumption is often violated: the probability of issuing a heat alert on a cooler day is zero. To overcome this challenge, we propose a stochastic intervention for time series data which is implemented via an incremental time-varying propensity score (ItvPS). The ItvPS intervention is executed by multiplying the probability of issuing a heat alert on day $t$ -- conditional on past information up to day $t$ -- by an odds ratio $δ_t$. First, we introduce a new class of causal estimands that relies on the ItvPS intervention. We provide theoretical results to show that these causal estimands can be identified and estimated under a weaker version of the overlap assumption. Second, we propose nonparametric estimators based on the ItvPS and derive an upper bound for the variances of these estimators. Third, we extend this framework to multi-site time series using a spatial meta-analysis approach. Fourth, we show that the proposed estimators perform well in terms of bias and root mean squared error via simulations. Finally, we apply our proposed approach to estimate the causal effects of increasing the probability of issuing heat alerts on each warm-season day in reducing deaths and hospitalizations among Medicare enrollees in $2,837$ U.S. counties.
△ Less
Submitted 29 August, 2022; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Making Paper Reviewing Robust to Bid Manipulation Attacks
Authors:
Ruihan Wu,
Chuan Guo,
Felix Wu,
Rahul Kidambi,
Laurens van der Maaten,
Kilian Q. Weinberger
Abstract:
Most computer science conferences rely on paper bidding to assign reviewers to papers. Although paper bidding enables high-quality assignments in days of unprecedented submission numbers, it also opens the door for dishonest reviewers to adversarially influence paper reviewing assignments. Anecdotal evidence suggests that some reviewers bid on papers by "friends" or colluding authors, even though…
▽ More
Most computer science conferences rely on paper bidding to assign reviewers to papers. Although paper bidding enables high-quality assignments in days of unprecedented submission numbers, it also opens the door for dishonest reviewers to adversarially influence paper reviewing assignments. Anecdotal evidence suggests that some reviewers bid on papers by "friends" or colluding authors, even though these papers are outside their area of expertise, and recommend them for acceptance without considering the merit of the work. In this paper, we study the efficacy of such bid manipulation attacks and find that, indeed, they can jeopardize the integrity of the review process. We develop a novel approach for paper bidding and assignment that is much more robust against such attacks. We show empirically that our approach provides robustness even when dishonest reviewers collude, have full knowledge of the assignment system's internal workings, and have access to the system's inputs. In addition to being more robust, the quality of our paper review assignments is comparable to that of current, non-robust assignment approaches.
△ Less
Submitted 22 February, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Understanding Decoupled and Early Weight Decay
Authors:
Johan Bjorck,
Kilian Weinberger,
Carla Gomes
Abstract:
Weight decay (WD) is a traditional regularization technique in deep learning, but despite its ubiquity, its behavior is still an area of active research. Golatkar et al. have recently shown that WD only matters at the start of the training in computer vision, upending traditional wisdom. Loshchilov et al. show that for adaptive optimizers, manually decaying weights can outperform adding an $l_2$ p…
▽ More
Weight decay (WD) is a traditional regularization technique in deep learning, but despite its ubiquity, its behavior is still an area of active research. Golatkar et al. have recently shown that WD only matters at the start of the training in computer vision, upending traditional wisdom. Loshchilov et al. show that for adaptive optimizers, manually decaying weights can outperform adding an $l_2$ penalty to the loss. This technique has become increasingly popular and is referred to as decoupled WD. The goal of this paper is to investigate these two recent empirical observations. We demonstrate that by applying WD only at the start, the network norm stays small throughout training. This has a regularizing effect as the effective gradient updates become larger. However, traditional generalizations metrics fail to capture this effect of WD, and we show how a simple scale-invariant metric can. We also show how the growth of network weights is heavily influenced by the dataset and its generalization properties. For decoupled WD, we perform experiments in NLP and RL where adaptive optimizers are the norm. We demonstrate that the primary issue that decoupled WD alleviates is the mixing of gradients from the objective function and the $l_2$ penalty in the buffers of Adam (which stores the estimates of the first-order moment). Adaptivity itself is not problematic and decoupled WD ensures that the gradients from the $l_2$ term cannot "drown out" the true objective, facilitating easier hyperparameter tuning.
△ Less
Submitted 26 December, 2020;
originally announced December 2020.
-
On Synergistic Benefits of Rate Splitting in IRS-assisted Cloud Radio Access Networks
Authors:
Kevin Weinberger,
Alaa Alameer Ahmad,
Aydin Sezgin
Abstract:
The concept of intelligent reflecting surfaces (IRSs) is considered as a promising technology for increasing the efficiency of mobile wireless networks. This is achieved by employing a vast amount of low-cost individually adjustable passive reflect elements, that are able to apply changes to the reflected signal. To this end, the IRS makes the environment realtime controllable and can be adjusted…
▽ More
The concept of intelligent reflecting surfaces (IRSs) is considered as a promising technology for increasing the efficiency of mobile wireless networks. This is achieved by employing a vast amount of low-cost individually adjustable passive reflect elements, that are able to apply changes to the reflected signal. To this end, the IRS makes the environment realtime controllable and can be adjusted to significantly increase the received signal quality at the users by passive beamsteering. However, the changes to the reflected signals have an effect on all users near the IRS, which makes it impossible to optimize the changes to positively influence every transmission, affected by the reflections. This results in some users not only experiencing better signal quality, but also an increase in received interference. To mitigate this negative side effect of the IRS, this paper utilizes the rate splitting (RS) technique, which enables the mitigation of interference within the network in such a way that it also mitigates the increased interference caused by the IRS. To investigate the effects on the overall power savings, that can be achieved by combining both techniques, we minimize the required transmit power, needed to satisfy per-user quality-of-service (QoS) constraints. Numerical results show the improved power savings, that can be gained by utilizing the IRS and the RS technique simultaneously. In fact, the concurrent use of both techniques yields power savings, which are beyond the cumulative power savings of using each technique separately.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Correlator Convolutional Neural Networks: An Interpretable Architecture for Image-like Quantum Matter Data
Authors:
Cole Miles,
Annabelle Bohrdt,
Ruihan Wu,
Christie Chiu,
Muqing Xu,
Geoffrey Ji,
Markus Greiner,
Kilian Q. Weinberger,
Eugene Demler,
Eun-Ah Kim
Abstract:
Machine learning models are a powerful theoretical tool for analyzing data from quantum simulators, in which results of experiments are sets of snapshots of many-body states. Recently, they have been successfully applied to distinguish between snapshots that can not be identified using traditional one and two point correlation functions. Thus far, the complexity of these models has inhibited new p…
▽ More
Machine learning models are a powerful theoretical tool for analyzing data from quantum simulators, in which results of experiments are sets of snapshots of many-body states. Recently, they have been successfully applied to distinguish between snapshots that can not be identified using traditional one and two point correlation functions. Thus far, the complexity of these models has inhibited new physical insights from this approach. Here, using a novel set of nonlinearities we develop a network architecture that discovers features in the data which are directly interpretable in terms of physical observables. In particular, our network can be understood as uncovering high-order correlators which significantly differ between the data studied. We demonstrate this new architecture on sets of simulated snapshots produced by two candidate theories approximating the doped Fermi-Hubbard model, which is realized in state-of-the art quantum gas microscopy experiments. From the trained networks, we uncover that the key distinguishing features are fourth-order spin-charge correlators, providing a means to compare experimental data to theoretical predictions. Our approach lends itself well to the construction of simple, end-to-end interpretable architectures and is applicable to arbitrary lattice data, thus paving the way for new physical insights from machine learning studies of experimental as well as numerical data.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Harnessing Interpretable and Unsupervised Machine Learning to Address Big Data from Modern X-ray Diffraction
Authors:
Jordan Venderley,
Michael Matty,
Krishnanand Mallayya,
Matthew Krogstad,
Jacob Ruff,
Geoff Pleiss,
Varsha Kishore,
David Mandrus,
Daniel Phelan,
Lekhanath Poudel,
Andrew Gordon Wilson,
Kilian Weinberger,
Puspa Upreti,
Michael R. Norman,
Stephan Rosenkranz,
Ray Osborn,
Eun-Ah Kim
Abstract:
The information content of crystalline materials becomes astronomical when collective electronic behavior and their fluctuations are taken into account. In the past decade, improvements in source brightness and detector technology at modern x-ray facilities have allowed a dramatically increased fraction of this information to be captured. Now, the primary challenge is to understand and discover sc…
▽ More
The information content of crystalline materials becomes astronomical when collective electronic behavior and their fluctuations are taken into account. In the past decade, improvements in source brightness and detector technology at modern x-ray facilities have allowed a dramatically increased fraction of this information to be captured. Now, the primary challenge is to understand and discover scientific principles from big data sets when a comprehensive analysis is beyond human reach. We report the development of a novel unsupervised machine learning approach, XRD Temperature Clustering (X-TEC), that can automatically extract charge density wave (CDW) order parameters and detect intra-unit cell (IUC) ordering and its fluctuations from a series of high-volume X-ray diffraction (XRD) measurements taken at multiple temperatures. We apply X-TEC to XRD data on a quasi-skutterudite family of materials, (Ca$_x$Sr$_{1-x}$)$_3$Rh$_4$Sn$_{13}$, where a quantum critical point arising from charge order is observed as a function of Ca concentration. We further apply X-TEC to XRD data on the pyrochlore metal, Cd$_2$Re$_2$O$_7$, to investigate its two much debated structural phase transitions and uncover the Goldstone mode accompanying them. We demonstrate how unprecedented atomic scale knowledge can be gained when human researchers connect the X-TEC results to physical principles. Specifically, we extract from the X-TEC-revealed selection rule that the Cd and Re displacements are approximately equal in amplitude, but out of phase. This discovery reveals a previously unknown involvement of $5d^2$ Re, supporting the idea of an electronic origin to the structural order. Our approach can radically transform XRD experiments by allowing in-operando data analysis and enabling researchers to refine experiments by discovering interesting regions of phase space on-the-fly.
△ Less
Submitted 9 March, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.