-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Authors:
Nathaniel Li,
Alexander Pan,
Anjali Gopal,
Summer Yue,
Daniel Berrios,
Alice Gatti,
Justin D. Li,
Ann-Kathrin Dombrowski,
Shashwat Goel,
Long Phan,
Gabriel Mukobi,
Nathan Helm-Burger,
Rassin Lababidi,
Lennart Justen,
Andrew B. Liu,
Michael Chen,
Isabelle Barrass,
Oliver Zhang,
Xiaoyuan Zhu,
Rishub Tamirisa,
Bhrugu Bharathi,
Adam Khoja,
Zhenqi Zhao,
Ariel Herbert-Voss,
Cort B. Breuer
, et al. (32 additional authors not shown)
Abstract:
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe…
▽ More
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
△ Less
Submitted 15 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Optimizing Algorithms From Pairwise User Preferences
Authors:
Leonid Keselman,
Katherine Shih,
Martial Hebert,
Aaron Steinfeld
Abstract:
Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an impl…
▽ More
Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an implicit reward function; however, this reward may be difficult or impossible to effectively capture. In this work, we introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. SortCMA efficiently and robustly leverages user input to find parameter sets without directly modeling a reward. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation, which involves highly complex preferences over robot behavior. We show that our method succeeds in optimizing for the user's goals and perform a user study to evaluate social navigation results.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Improved Calibration of RF Cavities for Relativistic Electron Beams: Effects of Secondary Corrections and Experimental Verification
Authors:
K. Shih,
I. Petrushina,
V. N. Litvinenko,
I. Pinayev,
J. Ma,
G. Wang,
Y. **g,
Y. Wu
Abstract:
In the aspect of longitudinal beam bunching, the bunching strength can be controlled by the RF cavity phase and voltage. However, these machine parameters are different from those that interact with the beam itself. In order to gain control of the beam-cavity interaction, cavity calibration must be performed. Furthermore, it relies on fitting the beam energy gain versus cavity phase to a calibrati…
▽ More
In the aspect of longitudinal beam bunching, the bunching strength can be controlled by the RF cavity phase and voltage. However, these machine parameters are different from those that interact with the beam itself. In order to gain control of the beam-cavity interaction, cavity calibration must be performed. Furthermore, it relies on fitting the beam energy gain versus cavity phase to a calibration function. Under the conventional assumption of relativistic beam conditions, the calibration function is a first harmonic sinusoidal function (a sinusoidal function with a period of 2π). However, this expression is insufficient for a high-voltage bunching cavity. Due to beam acceleration inside the cavity, an energy bias and a second harmonic function should be included to modify the conventional calibration function, even for a relativistic electron beam. In this paper, we will derive this modification and provide a comparison to both the Coherent Electron Cooling Experiment and the IMPACT-T simulation, respectively.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
WiFi-TCN: Temporal Convolution for Human Interaction Recognition based on WiFi signal
Authors:
Chih-Yang Lin,
Chia-Yu Lin,
Yu-Tso Liu,
Timothy K. Shih
Abstract:
The utilization of Wi-Fi based human activity recognition has gained considerable interest in recent times, primarily owing to its applications in various domains such as healthcare for monitoring breath and heart rate, security, elderly care. These Wi-Fi-based methods exhibit several advantages over conventional state-of-the-art techniques that rely on cameras and sensors, including lower costs a…
▽ More
The utilization of Wi-Fi based human activity recognition has gained considerable interest in recent times, primarily owing to its applications in various domains such as healthcare for monitoring breath and heart rate, security, elderly care. These Wi-Fi-based methods exhibit several advantages over conventional state-of-the-art techniques that rely on cameras and sensors, including lower costs and ease of deployment. However, a significant challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes. To mitigate this issue, it is imperative to train the model using an extensive dataset. In recent studies, the utilization of CNN-based models or sequence-to-sequence models such as LSTM, GRU, or Transformer has become prevalent. While sequence-to-sequence models can be more precise, they are also more computationally intensive and require a larger amount of training data. To tackle these limitations, we propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA. Our proposed method is computationally efficient and exhibits improved accuracy even when the data size is increased threefold through our augmentation techniques. Our experiments on a publicly available dataset indicate that our approach outperforms existing state-of-the-art methods, with a final accuracy of 99.42%.
△ Less
Submitted 11 January, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Authors:
Rohan Badlani,
Akshit Arora,
Subhankar Ghosh,
Rafael Valle,
Kevin J. Shih,
João Felipe Santos,
Boris Ginsburg,
Bryan Catanzaro
Abstract:
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Cha…
▽ More
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Multilingual Multiaccented Multispeaker TTS with RADTTS
Authors:
Rohan Badlani,
Rafael Valle,
Kevin J. Shih,
João Felipe Santos,
Siddharth Gururani,
Bryan Catanzaro
Abstract:
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfe…
▽ More
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfer capabilities. To overcome this, we present a multilingual, multiaccented, multispeaker speech synthesis model based on RADTTS with explicit control over accent, language, speaker and fine-grained $F_0$ and energy features. Our proposed model does not rely on bilingual training data. We demonstrate an ability to control synthesized accent for any speaker in an open-source dataset comprising of 7 accents. Human subjective evaluation demonstrates that our model can better retain a speaker's voice and accent quality than controlled baselines while synthesizing fluent speech in all target languages and accents in our dataset.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
3D Theory of Microscopic Instabilities Driven by Space-Charge Forces
Authors:
Vladimir Litvinenko,
Yichao **g,
Jun Ma,
Irina Petrushina,
Kai Shih,
Gang Wang
Abstract:
Microscopic, or short-wavelength, instabilities are known for drastic reduction of the beam quality and strong amplification of the noise in a beam. Space charge and coherent synchrotron radiation are known to be the leading causes for such instabilities. In this paper we present rigorous 3D theory of such instabilities driven by the space-charge forces. We define the condition when our theory is…
▽ More
Microscopic, or short-wavelength, instabilities are known for drastic reduction of the beam quality and strong amplification of the noise in a beam. Space charge and coherent synchrotron radiation are known to be the leading causes for such instabilities. In this paper we present rigorous 3D theory of such instabilities driven by the space-charge forces. We define the condition when our theory is applicable for an arbitrary accelerator system with 3D coupling. Finally, we derive a linear integral equation describing such instability and identify conditions when it can be reduced to an ordinary second order differential equation.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures
Authors:
Nannan Li,
Kevin J. Shih,
Bryan A. Plummer
Abstract:
Human pose transfer synthesizes new view(s) of a person for a given pose. Recent work achieves this via self-reconstruction, which disentangles a person's pose and texture information by breaking the person down into parts, then recombines them for reconstruction. However, part-level disentanglement preserves some pose information that can create unwanted artifacts. In this paper, we propose Pose…
▽ More
Human pose transfer synthesizes new view(s) of a person for a given pose. Recent work achieves this via self-reconstruction, which disentangles a person's pose and texture information by breaking the person down into parts, then recombines them for reconstruction. However, part-level disentanglement preserves some pose information that can create unwanted artifacts. In this paper, we propose Pose Transfer by Permuting Textures (PT$^2$), an approach for self-driven human pose transfer that disentangles pose from texture at the patch-level. Specifically, we remove pose from an input image by permuting image patches so only texture information remains. Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement. To reduce noise and recover clothing shape information from the permuted patches, we employ encoders with multiple kernel sizes in a triple branch network. On DeepFashion and Market-1501, PT$^2$ reports significant gains on automatic metrics over other self-driven methods, and even outperforms some fully-supervised methods. A user study also reports images generated by our method are preferred in 68% of cases over self-driven approaches from prior work. Code is available at https://github.com/NannanLi999/pt_square.
△ Less
Submitted 30 August, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Authors:
Kevin J. Shih,
Rafael Valle,
Rohan Badlani,
João Felipe Santos,
Bryan Catanzaro
Abstract:
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for ha…
▽ More
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for handling the aforementioned issues in the context of Normalizing Flow models. We also find this problem to be very well suited for Neural Spline flows, which is a highly expressive alternative to the more common affine-coupling mechanism in Normalizing Flows.
△ Less
Submitted 27 June, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
One TTS Alignment To Rule Them All
Authors:
Rohan Badlani,
Adrian Łancucki,
Kevin J. Shih,
Rafael Valle,
Wei **,
Bryan Catanzaro
Abstract:
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durati…
▽ More
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Clinical Named Entity Recognition using Contextualized Token Representations
Authors:
Yichao Zhou,
Chelsea Ju,
J. Harry Caufield,
Kevin Shih,
Calvin Chen,
Yizhou Sun,
Kai-Wei Chang,
Peipei **,
Wei Wang
Abstract:
The clinical named entity recognition (CNER) task seeks to locate and classify clinical terminologies into predefined categories, such as diagnostic procedure, disease disorder, severity, medication, medication dosage, and sign symptom. CNER facilitates the study of side-effect on medications including identification of novel phenomena and human-focused information extraction. Existing approaches…
▽ More
The clinical named entity recognition (CNER) task seeks to locate and classify clinical terminologies into predefined categories, such as diagnostic procedure, disease disorder, severity, medication, medication dosage, and sign symptom. CNER facilitates the study of side-effect on medications including identification of novel phenomena and human-focused information extraction. Existing approaches in extracting the entities of interests focus on using static word embeddings to represent each word. However, one word can have different interpretations that depend on the context of the sentences. Evidently, static word embeddings are insufficient to integrate the diverse interpretation of a word. To overcome this challenge, the technique of contextualized word embedding has been introduced to better capture the semantic meaning of each word based on its context. Two of these language models, ELMo and Flair, have been widely used in the field of Natural Language Processing to generate the contextualized word embeddings on domain-generic documents. However, these embeddings are usually too general to capture the proximity among vocabularies of specific domains. To facilitate various downstream applications using clinical case reports (CCRs), we pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair) using the clinical-related corpus from the PubMed Central. Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Authors:
Rafael Valle,
Kevin Shih,
Ryan Prenger,
Bryan Catanzaro
Abstract:
In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple a…
▽ More
In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible map** of data to a latent space that can be manipulated to control many aspects of speech synthesis (pitch, tone, speech rate, cadence, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. In addition, we provide results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training. Code and pre-trained models will be made publicly available at https://github.com/NVIDIA/flowtron
△ Less
Submitted 16 July, 2020; v1 submitted 12 May, 2020;
originally announced May 2020.
-
High brightness CW electron beams from Superconducting RF photoemission gun
Authors:
I. Petrushina,
V. N. Litvinenko,
Y. **g,
J. Ma,
I. Pinayev,
K. Shih,
G. Wang,
Y. H. Wu,
J. C. Brutus,
Z. Altinbas,
A. Di Lieto,
P. Inacker,
J. Jamilkowski,
G. Mahler,
M. Mapes,
T. Miller,
G. Narayan,
M. Paniccia,
T. Roser,
F. Severino,
J. Skaritka,
L. Smart,
K. Smith,
V. Soria,
Y. Than
, et al. (10 additional authors not shown)
Abstract:
CW photoinjectors operating at high accelerating gradients promise to revolutionize many areas of science and applications. They can establish the basis for a new generation of monochromatic X-ray free electron lasers, high brightness hadron beams, or a new generation of microchip production. In this letter we report on the record-performing superconducting RF electron gun with…
▽ More
CW photoinjectors operating at high accelerating gradients promise to revolutionize many areas of science and applications. They can establish the basis for a new generation of monochromatic X-ray free electron lasers, high brightness hadron beams, or a new generation of microchip production. In this letter we report on the record-performing superconducting RF electron gun with $\textrm{CsK}_{2}\textrm{Sb}$ photocathode. The gun is generating high charge electron bunches (up to 10 nC/bunch) and low transverse emittances, while operating for months with a single photocathode. This achievement opens a new era in generating high-power beams with a very high average brightness.
△ Less
Submitted 16 March, 2020; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos
Authors:
Aysegul Dundar,
Kevin J. Shih,
Animesh Garg,
Robert Pottorf,
Andrew Tao,
Bryan Catanzaro
Abstract:
Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in ord…
▽ More
Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. This work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions, conditioning only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest. Furthermore, the rendered background quality is also improved, as the background rendering pipeline no longer requires the ill-suited landmarks to model its pose and appearance. We demonstrate this improvement in the context of the video-prediction task.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
Light Field Synthesis by Training Deep Network in the Refocused Image Domain
Authors:
Chang-Le Liu,
Kuang-Tsu Shih,
Jiun-Woei Huang,
Homer H. Chen
Abstract:
Light field imaging, which captures spatio-angular information of incident light on image sensor, enables many interesting applications like image refocusing and augmented reality. However, due to the limited sensor resolution, a trade-off exists between the spatial and angular resolution. To increase the angular resolution, view synthesis techniques have been adopted to generate new views from ex…
▽ More
Light field imaging, which captures spatio-angular information of incident light on image sensor, enables many interesting applications like image refocusing and augmented reality. However, due to the limited sensor resolution, a trade-off exists between the spatial and angular resolution. To increase the angular resolution, view synthesis techniques have been adopted to generate new views from existing views. However, traditional learning-based view synthesis mainly considers the image quality of each view of the light field and neglects the quality of the refocused images. In this paper, we propose a new loss function called refocused image error (RIE) to address the issue. The main idea is that the image quality of the synthesized light field should be optimized in the refocused image domain because it is where the light field is perceived. We analyze the behavior of RIL in the spectral domain and test the performance of our approach against previous approaches on both real and software-rendered light field datasets using objective assessment metrics such as MSE, MAE, PSNR, SSIM, and GMSD. Experimental results show that the light field generated by our method results in better refocused images than previous methods.
△ Less
Submitted 28 April, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Video Interpolation and Prediction with Unsupervised Landmarks
Authors:
Kevin J. Shih,
Aysegul Dundar,
Animesh Garg,
Robert Pottorf,
Andrew Tao,
Bryan Catanzaro
Abstract:
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space,…
▽ More
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical flow based techniques generalize but are suitable only for short temporal ranges. Many methods opt to project the video frames to a low dimensional latent space, achieving long-range predictions. However, these latent representations are often non-interpretable, and therefore difficult to manipulate. This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space. The latent representations capture foreground semantics without explicit supervision such as keypoints or poses. Further, as each landmark can be mapped to a coordinate indicating where a semantic part is positioned, we can reliably interpolate within the coordinate domain to achieve predictable motion interpolation. Given an image decoder capable of map** these landmarks back to the image domain, we are able to achieve high-quality long-range video interpolation and extrapolation by operating on the landmark representation space.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Unsupervised Video Interpolation Using Cycle Consistency
Authors:
Fitsum A. Reda,
Deqing Sun,
Aysegul Dundar,
Mohammad Shoeybi,
Guilin Liu,
Kevin J. Shih,
Andrew Tao,
Jan Kautz,
Bryan Catanzaro
Abstract:
Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the dis…
▽ More
Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the discrepancy between the center frame and its cycle reconstruction, obtained by interpolating back from interpolated intermediate frames. This simple unsupervised constraint alone achieves results comparable with supervision using the ground truth intermediate frames. We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model. The pseudo supervised loss term, used together with cycle consistency, can effectively adapt a pre-trained model to a new target domain. With no additional data and in a completely unsupervised fashion, our techniques significantly improve pre-trained models on new target domains, increasing PSNR values from 32.84dB to 33.05dB on the Slowflow and from 31.82dB to 32.53dB on the Sintel evaluation datasets.
△ Less
Submitted 27 March, 2021; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Graphical Contrastive Losses for Scene Graph Parsing
Authors:
Ji Zhang,
Kevin J. Shih,
Ahmed Elgammal,
Andrew Tao,
Bryan Catanzaro
Abstract:
Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses mul…
▽ More
Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e.g. multiple cups). The second, Proximal Relationship Ambiguity, arises when multiple subject-predicate-object triplets appear in close proximity with the same predicate, and the model struggles to infer the correct subject-object pairings (e.g. mis-pairing musicians and their instruments). We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses. These losses explicitly force the model to disambiguate related and unrelated instances through margin constraints specific to each type of confusion. We further construct a relationship detector, called RelDN, using the aforementioned pipeline to demonstrate the efficacy of our proposed losses. Our model outperforms the winning method of the OpenImages Relationship Detection Challenge by 4.7\% (16.5\% relative) on the test set. We also show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.
△ Less
Submitted 16 August, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
Plasma-Cascade Instability- theory, simulations and experiment
Authors:
Vladimir N. Litvinenko,
Gang Wang,
Yichao **g,
Dmitry Kayran,
Jun Ma,
Irina Petrushina,
Igor Pinayev,
Kai Shih
Abstract:
In this letter we describe a new micro-bunching instability occurring in charged particle beams propagating along a straight trajectory: based on the dynamics we named it a Plasma Cascade Instability.
In this letter we describe a new micro-bunching instability occurring in charged particle beams propagating along a straight trajectory: based on the dynamics we named it a Plasma Cascade Instability.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
Solenoid: universal tool for measuring beam parameters
Authors:
Igor Pinayev,
Yichao **g,
Dmitry Kayran,
Vladimir N. Litvinenko,
Kentaro Mihara,
Irina Petrushina,
Kay Shih,
Gang Wang
Abstract:
Solenoids are frequently used for focusing of the low energy electron beams. In this paper we focus on using these magnets as a nearly universal tool for measuring beam parameters including energy, emittance, and the beam position and angle with respect to the solenoid axis. We describe in detail corresponding procedures as well as experimental results of such measurements.
Solenoids are frequently used for focusing of the low energy electron beams. In this paper we focus on using these magnets as a nearly universal tool for measuring beam parameters including energy, emittance, and the beam position and angle with respect to the solenoid axis. We describe in detail corresponding procedures as well as experimental results of such measurements.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Improving Semantic Segmentation via Video Propagation and Label Relaxation
Authors:
Yi Zhu,
Karan Sapra,
Fitsum A. Reda,
Kevin J. Shih,
Shawn Newsam,
Andrew Tao,
Bryan Catanzaro
Abstract:
Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels.…
▽ More
Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid. Our single model, without model ensembles, achieves 72.8% mIoU on the KITTI semantic segmentation test set, which surpasses the winning entry of the ROB challenge 2018. Our code and videos can be found at https://nv-adlr.github.io/publication/2018-Segmentation.
△ Less
Submitted 2 July, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Partial Convolution based Padding
Authors:
Guilin Liu,
Kevin J. Shih,
Ting-Chun Wang,
Fitsum A. Reda,
Karan Sapra,
Zhiding Yu,
Andrew Tao,
Bryan Catanzaro
Abstract:
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders…
▽ More
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders based on the ratios between the padded area and the convolution sliding window area. Extensive experiments with various deep network models on ImageNet classification and semantic segmentation demonstrate that the proposed padding scheme consistently outperforms standard zero padding with better accuracy.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
An Interpretable Model for Scene Graph Generation
Authors:
Ji Zhang,
Kevin Shih,
Andrew Tao,
Bryan Catanzaro,
Ahmed Elgammal
Abstract:
We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and inv…
▽ More
We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and investigate the efficacy of our model. We won the champion of the OpenImages Visual Relationship Detection Challenge on Kaggle, where we outperform the 2nd place by 5\% (20\% relatively). We believe an accurate scene graph generator is a fundamental step** stone for higher-level vision-language tasks such as image captioning and visual QA, since it provides a semantic, structured comprehension of an image that is beyond pixels and objects.
△ Less
Submitted 21 November, 2018;
originally announced November 2018.
-
Revisiting Image-Language Networks for Open-ended Phrase Detection
Authors:
Bryan A. Plummer,
Kevin J. Shih,
Yichen Li,
Ke Xu,
Svetlana Lazebnik,
Stan Sclaroff,
Kate Saenko
Abstract:
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image. In this paper we address a more realistic version of the natural language grounding task where we must both identify whether the phrase is relevant to an image and localize the phrase. This can also be viewed as a generalization of object detection to…
▽ More
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image. In this paper we address a more realistic version of the natural language grounding task where we must both identify whether the phrase is relevant to an image and localize the phrase. This can also be viewed as a generalization of object detection to an open-ended vocabulary, introducing elements of few- and zero-shot detection. We propose an approach for this task that extends Faster R-CNN to relate image regions and phrases. By carefully initializing the classification layers of our network using canonical correlation analysis (CCA), we encourage a solution that is more discerning when reasoning between similar phrases, resulting in over double the performance compared to a naive adaptation on three popular phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, with test-time phrase vocabulary sizes of 5K, 32K, and 159K, respectively.
△ Less
Submitted 12 October, 2020; v1 submitted 17 November, 2018;
originally announced November 2018.
-
SDCNet: Video Prediction Using Spatially-Displaced Convolution
Authors:
Fitsum A. Reda,
Guilin Liu,
Kevin J. Shih,
Robert Kirby,
Jon Barker,
David Tarjan,
Andrew Tao,
Bryan Catanzaro
Abstract:
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent app…
▽ More
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent approaches synthesis a pixel by convolving input patches with a predicted kernel. However, their memory requirement increases with kernel size. Here, we spatially-displaced convolution (SDC) module for video frame prediction. We learn a motion vector and a kernel for each pixel and synthesize a pixel by applying the kernel at a displaced location in the source image, defined by the predicted motion vector. Our approach inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages. We train our model on 428K unlabelled 1080p video game frames. Our approach produces state-of-the-art results, achieving an SSIM score of 0.904 on high-definition YouTube-8M videos, 0.918 on Caltech Pedestrian videos. Our model handles large motion effectively and synthesizes crisp frames with consistent motion.
△ Less
Submitted 27 March, 2021; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge
Authors:
Ji Zhang,
Kevin Shih,
Andrew Tao,
Bryan Catanzaro,
Ahmed Elgammal
Abstract:
This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution $P(predicate|subject,object)$ in the training set and directly use that in testing. This baseline achieved the 2nd place…
▽ More
This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle. Three key factors contribute the most to our success: 1) language bias is a powerful baseline for this task. We build the empirical distribution $P(predicate|subject,object)$ in the training set and directly use that in testing. This baseline achieved the 2nd place when submitted; 2) spatial features are as important as visual features, especially for spatial relationships such as "under" and "inside of"; 3) It is a very effective way to fuse different features by first building separate modules for each of them, then adding their output logits before the final softmax layer. We show in ablation study that each factor can improve the performance to a non-trivial extent, and the model reaches optimal when all of them are combined.
△ Less
Submitted 7 November, 2018; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Image Inpainting for Irregular Holes Using Partial Convolutions
Authors:
Guilin Liu,
Fitsum A. Reda,
Kevin J. Shih,
Ting-Chun Wang,
Andrew Tao,
Bryan Catanzaro
Abstract:
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, bu…
▽ More
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.
△ Less
Submitted 15 December, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Learning Interpretable Spatial Operations in a Rich 3D Blocks World
Authors:
Yonatan Bisk,
Kevin J. Shih,
Ye** Choi,
Daniel Marcu
Abstract:
In this paper, we study the problem of map** natural language instructions to complex spatial actions in a 3D blocks world. We first introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as "mirroring", "twisting", and "balancing". This dataset, built on the simulation environment of…
▽ More
In this paper, we study the problem of map** natural language instructions to complex spatial actions in a 3D blocks world. We first introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as "mirroring", "twisting", and "balancing". This dataset, built on the simulation environment of Bisk, Yuret, and Marcu (2016), attains language that is significantly richer and more complex, while also doubling the size of the original dataset in the 2D environment with 100 new world configurations and 250,000 tokens. In addition, we propose a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations (Figure 5)
△ Less
Submitted 24 December, 2017; v1 submitted 9 December, 2017;
originally announced December 2017.
-
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Authors:
Tanmay Gupta,
Kevin Shih,
Saurabh Singh,
Derek Hoiem
Abstract:
An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answe…
▽ More
An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.
△ Less
Submitted 16 October, 2017; v1 submitted 2 April, 2017;
originally announced April 2017.
-
Compact ring-based X-ray source with on-orbit and on-energy laser-plasma injection
Authors:
Marlene Turner,
Jeremy Cheatam,
Auralee Edelen,
James Gerity,
Andrew Lajoie,
Gerard Lawler,
Osip Lishilin,
Kook** Moon,
Aakash Ajit Sahai,
Andrei Seryi,
Kai Shih,
Brandon Zerbe
Abstract:
We report here the results of a one week long investigation into the conceptual design of an X-ray source based on a compact ring with on-orbit and on-energy laser-plasma accelerator. We performed these studies during the June 2016 USPAS class "Physics of Accelerators, Lasers, and Plasma..." applying the art of inventiveness TRIZ. We describe three versions of the light source with the constraints…
▽ More
We report here the results of a one week long investigation into the conceptual design of an X-ray source based on a compact ring with on-orbit and on-energy laser-plasma accelerator. We performed these studies during the June 2016 USPAS class "Physics of Accelerators, Lasers, and Plasma..." applying the art of inventiveness TRIZ. We describe three versions of the light source with the constraints of the electron beam with energy $1\,\rm{GeV}$ or $3\,\rm{GeV}$ and a magnetic lattice design being normal conducting (only for the $1\,\rm{GeV}$ beam) or superconducting (for either beam). The electron beam recirculates in the ring, to increase the effective photon flux. We describe the design choices, present relevant parameters, and describe insights into such machines.
△ Less
Submitted 17 October, 2016;
originally announced October 2016.
-
Where To Look: Focus Regions for Visual Question Answering
Authors:
Kevin J. Shih,
Saurabh Singh,
Derek Hoiem
Abstract:
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest…
▽ More
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.
△ Less
Submitted 10 January, 2016; v1 submitted 23 November, 2015;
originally announced November 2015.
-
Measurement of Cosmic-ray Muons and Muon-induced Neutrons in the Aberdeen Tunnel Underground Laboratory
Authors:
S. C. Blyth,
Y. L. Chan,
X. C. Chen,
M. C. Chu,
K. X. Cui,
R. L. Hahn,
T. H. Ho,
Y. K. Hor,
Y. B. Hsiung,
B. Z. Hu,
K. K. Kwan,
M. W. Kwok,
T. Kwok,
Y. P. Lau,
K. P. Lee,
J. K. C. Leung,
K. Y. Leung,
G. L. Lin,
Y. C. Lin,
K. B. Luk,
W. H. Luk,
H. Y. Ngai,
W. K. Ngai,
S. Y. Ngan,
C. S. J. Pun
, et al. (9 additional authors not shown)
Abstract:
We have measured the muon flux and production rate of muon-induced neutrons at a depth of 611 m water equivalent. Our apparatus comprises three layers of crossed plastic scintillator hodoscopes for tracking the incident cosmic-ray muons and 760 L of gadolinium-doped liquid scintillator for producing and detecting neutrons. The vertical muon intensity was measured to be…
▽ More
We have measured the muon flux and production rate of muon-induced neutrons at a depth of 611 m water equivalent. Our apparatus comprises three layers of crossed plastic scintillator hodoscopes for tracking the incident cosmic-ray muons and 760 L of gadolinium-doped liquid scintillator for producing and detecting neutrons. The vertical muon intensity was measured to be $I_μ = (5.7 \pm 0.6) \times 10^{-6}$ cm$^{-2}$s$^{-1}$sr$^{-1}$. The yield of muon-induced neutrons in the liquid scintillator was determined to be $Y_{n} = (1.19 \pm 0.08 (stat) \pm 0.21 (syst)) \times 10^{-4}$ neutrons/($μ\cdot$g$\cdot$cm$^{-2}$). A fit to the recently measured neutron yields at different depths gave a mean muon energy dependence of $\left\langle E_μ \right\rangle^{0.76 \pm 0.03}$ for liquid-scintillator targets.
△ Less
Submitted 26 November, 2016; v1 submitted 30 September, 2015;
originally announced September 2015.
-
The Detector System of The Daya Bay Reactor Neutrino Experiment
Authors:
F. P. An,
J. Z. Bai,
A. B. Balantekin,
H. R. Band,
D. Beavis,
W. Beriguete,
M. Bishai,
S. Blyth,
R. L. Brown,
I. Butorov,
D. Cao,
G. F. Cao,
J. Cao,
R. Carr,
W. R. Cen,
W. T. Chan,
Y. L. Chan,
J. F. Chang,
L. C. Chang,
Y. Chang,
C. Chasman,
H. Y. Chen,
H. S. Chen,
M. J. Chen,
Q. Y. Chen
, et al. (310 additional authors not shown)
Abstract:
The Daya Bay experiment was the first to report simultaneous measurements of reactor antineutrinos at multiple baselines leading to the discovery of $\barν_e$ oscillations over km-baselines. Subsequent data has provided the world's most precise measurement of $\rm{sin}^22θ_{13}$ and the effective mass splitting $Δm_{ee}^2$. The experiment is located in Daya Bay, China where the cluster of six nucl…
▽ More
The Daya Bay experiment was the first to report simultaneous measurements of reactor antineutrinos at multiple baselines leading to the discovery of $\barν_e$ oscillations over km-baselines. Subsequent data has provided the world's most precise measurement of $\rm{sin}^22θ_{13}$ and the effective mass splitting $Δm_{ee}^2$. The experiment is located in Daya Bay, China where the cluster of six nuclear reactors is among the world's most prolific sources of electron antineutrinos. Multiple antineutrino detectors are deployed in three underground water pools at different distances from the reactor cores to search for deviations in the antineutrino rate and energy spectrum due to neutrino mixing. Instrumented with photomultiplier tubes (PMTs), the water pools serve as shielding against natural radioactivity from the surrounding rock and provide efficient muon tagging. Arrays of resistive plate chambers over the top of each pool provide additional muon detection. The antineutrino detectors were specifically designed for measurements of the antineutrino flux with minimal systematic uncertainty. Relative detector efficiencies between the near and far detectors are known to better than 0.2%. With the unblinding of the final two detectors' baselines and target masses, a complete description and comparison of the eight antineutrino detectors can now be presented. This paper describes the Daya Bay detector systems, consisting of eight antineutrino detectors in three instrumented water pools in three underground halls, and their operation through the first year of eight detector data-taking.
△ Less
Submitted 7 January, 2016; v1 submitted 17 August, 2015;
originally announced August 2015.
-
Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization
Authors:
Kevin J. Shih,
Arun Mallya,
Saurabh Singh,
Derek Hoiem
Abstract:
We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification. We show that by conditioning the predictions on object proposals with sufficient image support, our method can do well without complicated spatial reasoning. Instead, inference methods with…
▽ More
We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification. We show that by conditioning the predictions on object proposals with sufficient image support, our method can do well without complicated spatial reasoning. Instead, inference methods with robustness to outliers, yield state-of-the-art for keypoint localization. We demonstrate the effectiveness of our accurate keypoint localization and visibility prediction on the fine-grained bird recognition task with and without ground truth bird bounding boxes, and outperform existing state-of-the-art methods by over 2%.
△ Less
Submitted 22 July, 2015;
originally announced July 2015.
-
Ab initio density functional theory study of uranium solubility in Gd2Zr2O7 pyrochlore
Authors:
Qing-yun Chen,
Kai-min Shih,
Chuan-min Meng,
Chang-zhong Liao,
Lie-lin Wang,
Hua Xie,
Hui-yi Lv,
Tao Wu,
Shi-yin Ji,
Yu-zhu Huang
Abstract:
In this study, an ab initio calculation is performed to investigate the uranium solubility in different sites of Gd2Zr2O7 pyrochlore. The Gd2Zr2O7 maintains its pyrochlore structure at low uranium dopant levels, and the lattice constants of Gd2(Zr2-yUy)O7 and (Gd2-yUy)Zr2O7 are generally expressed as being linearly related to the uranium content y. Uranium is found to be a preferable substitute fo…
▽ More
In this study, an ab initio calculation is performed to investigate the uranium solubility in different sites of Gd2Zr2O7 pyrochlore. The Gd2Zr2O7 maintains its pyrochlore structure at low uranium dopant levels, and the lattice constants of Gd2(Zr2-yUy)O7 and (Gd2-yUy)Zr2O7 are generally expressed as being linearly related to the uranium content y. Uranium is found to be a preferable substitute for the B-site gadolinium atoms in cation-disordered Gd2Zr2O7 (where gadolinium and zirconium atoms are swapped) over the A-site gadolinium atoms in ordered Gd2Zr2O7 due to the lower total energy of (Gd2-yZry)(Zr2-yUy)O7. The theoretical findings present a reasonable explanation of recent experiment results.
△ Less
Submitted 26 April, 2015;
originally announced April 2015.
-
Efficient Media Retrieval from Non-Cooperative Queries
Authors:
Kevin Shih,
Wei Di,
Vignesh Jagadeesh,
Robinson Piramuthu
Abstract:
Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from openlibrary.org, we construct a large scale book cover retrieval dataset, complete with 100K distractor covers and title and author strings for each. Becaus…
▽ More
Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names. Using the images from the book cover set from the Stanford Mobile Visual Search dataset and additional book covers and metadata from openlibrary.org, we construct a large scale book cover retrieval dataset, complete with 100K distractor covers and title and author strings for each. Because our query images are poorly conditioned for clean text extraction, we propose a method for extracting a matching noisy and erroneous OCR readings and matching it against clean author and book title strings in a standard document look-up problem setup. Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setup to achieve significant improvements in retrieval accuracy over that of either VLAD or the text alone.
△ Less
Submitted 19 November, 2014;
originally announced November 2014.
-
Assembly and Installation of the Daya Bay Antineutrino Detectors
Authors:
H. R. Band,
R. L. Brown,
R. Carr,
X. C. Chen,
X. H. Chen,
J. J. Cherwinka,
M. C. Chu,
E. Draeger,
D. A. Dwyer,
W. R. Edwards,
R. Gill,
J. Goett,
L. S. Greenler,
W. Q. Gu,
W. S. He,
K. M. Heeger,
Y. K. Heng,
P. Hinrichs,
T. H. Ho,
M. Hoff,
Y. B. Hsiung,
Y. **,
L. Kang,
S. H. Kettell,
M. Kramer
, et al. (44 additional authors not shown)
Abstract:
The Daya Bay reactor antineutrino experiment is designed to make a precision measurement of the neutrino mixing angle theta13, and recently made the definitive discovery of its nonzero value. It utilizes a set of eight, functionally identical antineutrino detectors to measure the reactor flux and spectrum at baselines of 300 - 2000m from the Daya Bay and Ling Ao Nuclear Power Plants. The Daya Bay…
▽ More
The Daya Bay reactor antineutrino experiment is designed to make a precision measurement of the neutrino mixing angle theta13, and recently made the definitive discovery of its nonzero value. It utilizes a set of eight, functionally identical antineutrino detectors to measure the reactor flux and spectrum at baselines of 300 - 2000m from the Daya Bay and Ling Ao Nuclear Power Plants. The Daya Bay antineutrino detectors were built in an above-ground facility and deployed side-by-side at three underground experimental sites near and far from the nuclear reactors. This configuration allows the experiment to make a precision measurement of reactor antineutrino disappearance over km-long baselines and reduces relative systematic uncertainties between detectors and nuclear reactors. This paper describes the assembly and installation of the Daya Bay antineutrino detectors.
△ Less
Submitted 6 September, 2013;
originally announced September 2013.
-
An apparatus for studying spallation neutrons in the Aberdeen Tunnel laboratory
Authors:
S. C. Blyth,
Y. L. Chan,
X. C. Chen,
M. C. Chu,
R. L. Hahn,
T. H. Ho,
Y. B. Hsiung,
B. Z. Hu,
K. K. Kwan,
M. W. Kwok,
T. Kwok,
Y. P. Lau,
K. P. Lee,
J. K. C. Leung,
K. Y. Leung,
G. L. Lin,
Y. C. Lin,
K. B. Luk,
W. H. Luk,
H. Y. Ngai,
S. Y. Ngan,
C. S. J. Pun,
K. Shih,
Y. H. Tam,
R. H. M. Tsang
, et al. (6 additional authors not shown)
Abstract:
In this paper, we describe the design, construction and performance of an apparatus installed in the Aberdeen Tunnel laboratory in Hong Kong for studying spallation neutrons induced by cosmic-ray muons under a vertical rock overburden of 611 meter water equivalent (m.w.e.). The apparatus comprises of six horizontal layers of plastic-scintillator hodoscopes for determining the direction and positio…
▽ More
In this paper, we describe the design, construction and performance of an apparatus installed in the Aberdeen Tunnel laboratory in Hong Kong for studying spallation neutrons induced by cosmic-ray muons under a vertical rock overburden of 611 meter water equivalent (m.w.e.). The apparatus comprises of six horizontal layers of plastic-scintillator hodoscopes for determining the direction and position of the incident cosmic-ray muons. Sandwiched between the hodoscope planes is a neutron detector filled with 650 kg of liquid scintillator doped with about 0.06% of Gadolinium by weight for improving the efficiency of detecting the spallation neutrons. Performance of the apparatus is also presented.
△ Less
Submitted 13 August, 2013;
originally announced August 2013.
-
Improved Measurement of Electron Antineutrino Disappearance at Daya Bay
Authors:
Daya Bay Collaboration,
F. P. An,
Q. An,
J. Z. Bai,
A. B. Balantekin,
H. R. Band,
W. Beriguete,
M. Bishai,
S. Blyth,
R. L. Brown,
G. F. Cao,
J. Cao,
R. Carr,
W. T. Chan,
J. F. Chang,
Y. Chang,
C. Chasman,
H. S. Chen,
H. Y. Chen,
S. J. Chen,
S. M. Chen,
X. C. Chen,
X. H. Chen,
X. S. Chen,
Y. Chen
, et al. (207 additional authors not shown)
Abstract:
We report an improved measurement of the neutrino mixing angle $θ_{13}$ from the Daya Bay Reactor Neutrino Experiment. We exclude a zero value for $\sin^22θ_{13}$ with a significance of 7.7 standard deviations. Electron antineutrinos from six reactors of 2.9 GW$_{\rm th}$ were detected in six antineutrino detectors deployed in two near (flux-weighted baselines of 470 m and 576 m) and one far (1648…
▽ More
We report an improved measurement of the neutrino mixing angle $θ_{13}$ from the Daya Bay Reactor Neutrino Experiment. We exclude a zero value for $\sin^22θ_{13}$ with a significance of 7.7 standard deviations. Electron antineutrinos from six reactors of 2.9 GW$_{\rm th}$ were detected in six antineutrino detectors deployed in two near (flux-weighted baselines of 470 m and 576 m) and one far (1648 m) underground experimental halls. Using 139 days of data, 28909 (205308) electron antineutrino candidates were detected at the far hall (near halls). The ratio of the observed to the expected number of antineutrinos assuming no oscillations at the far hall is $0.944\pm 0.007({\rm stat.}) \pm 0.003({\rm syst.})$. An analysis of the relative rates in six detectors finds $\sin^22θ_{13}=0.089\pm 0.010({\rm stat.})\pm0.005({\rm syst.})$ in a three-neutrino framework.
△ Less
Submitted 17 November, 2012; v1 submitted 23 October, 2012;
originally announced October 2012.
-
Daya Bay Antineutrino Detector Gas System
Authors:
H. R. Band,
J. J. Cherwinka,
M-C. Chu,
K. M. Heeger,
M. W. Kwok,
K. Shih,
T. Wise,
Q. Xiao
Abstract:
The Daya Bay Antineutrino Detector gas system is designed to protect the liquid scintillator targets of the antineutrino detectors against degradation and contamination from exposure to ambient laboratory air. The gas system is also used to monitor the leak tightness of the antineutrino detector assembly. The cover gas system constantly flushes the gas volumes above the liquid scintillator with dr…
▽ More
The Daya Bay Antineutrino Detector gas system is designed to protect the liquid scintillator targets of the antineutrino detectors against degradation and contamination from exposure to ambient laboratory air. The gas system is also used to monitor the leak tightness of the antineutrino detector assembly. The cover gas system constantly flushes the gas volumes above the liquid scintillator with dry nitrogen to minimize oxidation of the scintillator over the five year lifetime of the experiment. This constant flush also prevents the infiltration of radon or other contaminants into these detecting liquids kee** the internal backgrounds low. Since the Daya Bay antineutrino detectors are immersed in the large water pools of the muon veto system, other gas volumes are needed to protect vital detector cables or gas lines. These volumes are also purged with dry gas. Return gas is monitored for oxygen content and humidity to provide early warning of potentially damaging leaks. The design and performance of the Daya Bay Antineutrino Detector gas system is described.
△ Less
Submitted 29 October, 2012; v1 submitted 1 October, 2012;
originally announced October 2012.
-
Coherent versus Incoherent Light Scattering from a Quantum Dot
Authors:
K. Konthasinghe,
J. Walker,
M. Peiris,
C. K. Shih,
Y. Yu,
M. F. Li,
J. F. He,
L. J. Wang,
H. Q. Ni,
Z. C. Niu,
A. Muller
Abstract:
We analyze the light scattered by a single InAs quantum dot interacting with a resonant continuous-wave laser. High resolution spectra reveal clear distinctions between coherent and incoherent scattering, with the laser intensity spanning over four orders of magnitude. We find that the fraction of coherently scattered photons can approach unity under sufficiently weak or detuned excitation, ruling…
▽ More
We analyze the light scattered by a single InAs quantum dot interacting with a resonant continuous-wave laser. High resolution spectra reveal clear distinctions between coherent and incoherent scattering, with the laser intensity spanning over four orders of magnitude. We find that the fraction of coherently scattered photons can approach unity under sufficiently weak or detuned excitation, ruling out pure dephasing as a relevant decoherence mechanism. We show how spectral diffusion shapes spectra, correlation functions, and phase-coherence, concealing the ideal radiatively-broadened two-level system described by Mollow.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Observation of electron-antineutrino disappearance at Daya Bay
Authors:
F. P. An,
J. Z. Bai,
A. B. Balantekin,
H. R. Band,
D. Beavis,
W. Beriguete,
M. Bishai,
S. Blyth,
K. Boddy,
R. L. Brown,
B. Cai,
G. F. Cao,
J. Cao,
R. Carr,
W. T. Chan,
J. F. Chang,
Y. Chang,
C. Chasman,
H. S. Chen,
H. Y. Chen,
S. J. Chen,
S. M. Chen,
X. C. Chen,
X. H. Chen,
X. S. Chen
, et al. (246 additional authors not shown)
Abstract:
The Daya Bay Reactor Neutrino Experiment has measured a non-zero value for the neutrino mixing angle $θ_{13}$ with a significance of 5.2 standard deviations. Antineutrinos from six 2.9 GW$_{\rm th}$ reactors were detected in six antineutrino detectors deployed in two near (flux-weighted baseline 470 m and 576 m) and one far (1648 m) underground experimental halls. With a 43,000 ton-GW_{\rm th}-day…
▽ More
The Daya Bay Reactor Neutrino Experiment has measured a non-zero value for the neutrino mixing angle $θ_{13}$ with a significance of 5.2 standard deviations. Antineutrinos from six 2.9 GW$_{\rm th}$ reactors were detected in six antineutrino detectors deployed in two near (flux-weighted baseline 470 m and 576 m) and one far (1648 m) underground experimental halls. With a 43,000 ton-GW_{\rm th}-day livetime exposure in 55 days, 10416 (80376) electron antineutrino candidates were detected at the far hall (near halls). The ratio of the observed to expected number of antineutrinos at the far hall is $R=0.940\pm 0.011({\rm stat}) \pm 0.004({\rm syst})$. A rate-only analysis finds $\sin^22θ_{13}=0.092\pm 0.016({\rm stat})\pm0.005({\rm syst})$ in a three-neutrino framework.
△ Less
Submitted 2 April, 2012; v1 submitted 7 March, 2012;
originally announced March 2012.
-
A side-by-side comparison of Daya Bay antineutrino detectors
Authors:
Daya Bay Collaboration,
F. P. An,
Q. An,
J. Z. Bai,
A. B. Balantekin,
H. R. Band,
W. Beriguete,
M. Bishai,
S. Blyth,
R. L. Brown,
G. F. Cao,
J. Cao,
R. Carr,
J. F. Chang,
Y. Chang,
C. Chasman,
H. S. Chen,
S. J. Chen,
S. M. Chen,
X. C. Chen,
X. H. Chen,
X. S. Chen,
Y. Chen,
J. J. Cherwinka,
M. C. Chu
, et al. (218 additional authors not shown)
Abstract:
The Daya Bay Reactor Neutrino Experiment is designed to determine precisely the neutrino mixing angle $θ_{13}$ with a sensitivity better than 0.01 in the parameter sin$^22θ_{13}$ at the 90% confidence level. To achieve this goal, the collaboration will build eight functionally identical antineutrino detectors. The first two detectors have been constructed, installed and commissioned in Experimenta…
▽ More
The Daya Bay Reactor Neutrino Experiment is designed to determine precisely the neutrino mixing angle $θ_{13}$ with a sensitivity better than 0.01 in the parameter sin$^22θ_{13}$ at the 90% confidence level. To achieve this goal, the collaboration will build eight functionally identical antineutrino detectors. The first two detectors have been constructed, installed and commissioned in Experimental Hall 1, with steady data-taking beginning September 23, 2011. A comparison of the data collected over the subsequent three months indicates that the detectors are functionally identical, and that detector-related systematic uncertainties exceed requirements.
△ Less
Submitted 28 February, 2012;
originally announced February 2012.
-
Coherently driven non-classical light emission from a quantum dot
Authors:
A. Muller,
E. B. Flagg,
P. Bianucci,
D. G. Deppe,
W. Ma,
J. Zhang,
G. J. Salamo,
C. K. Shih
Abstract:
Narrow line-widths and the possibility of enhanced spontaneous emission via coupling to microcavities make semiconductor quantum dots ideal for harnessing coherent quantum phenomena at the single photon level. So far, however, all approaches have relied on incoherent pum**, which limits the desirable properties of the emission. In contrast, coherent excitation was recognized to be necessary fo…
▽ More
Narrow line-widths and the possibility of enhanced spontaneous emission via coupling to microcavities make semiconductor quantum dots ideal for harnessing coherent quantum phenomena at the single photon level. So far, however, all approaches have relied on incoherent pum**, which limits the desirable properties of the emission. In contrast, coherent excitation was recognized to be necessary for providing both improved photon indistinguishability and high efficiency, and offers the quantum control capabilities required for basic qubit manipulations. Here we achieve, for the first time, resonant and coherent excitation of a quantum dot with simultaneous collection of the non-classical photon emission. Second-order correlation measurements show the unique signature of a coherently-driven two-level quantum emitter: the photon statistics become oscillatory at high driving fields, reflecting the coherent evolution of the excitonic ground state of the quantum dot.
△ Less
Submitted 25 July, 2007;
originally announced July 2007.
-
Time-Resolved Spectroscopy of Single Excitons Bound to Pairs of Te Isoelectronic Impurity Centers in ZnSe
Authors:
A. Muller,
P. Bianucci,
C. Piermarocchi,
M. Fornari,
I. C. Robin,
R. Andre,
C. K. Shih
Abstract:
Tellurium impurity centers in ZnSe were individually probed with time-resolved photoluminescence (PL) spectroscopy. Resolution-limited peaks with an ultra-low spatial density originate in the recombination of excitons deeply bound to isolated nearest-neighbor isoelectronic Te pairs (Te2). This interpretation is confirmed by ab-initio calculations. The peaks reveal anti-bunched photon emission an…
▽ More
Tellurium impurity centers in ZnSe were individually probed with time-resolved photoluminescence (PL) spectroscopy. Resolution-limited peaks with an ultra-low spatial density originate in the recombination of excitons deeply bound to isolated nearest-neighbor isoelectronic Te pairs (Te2). This interpretation is confirmed by ab-initio calculations. The peaks reveal anti-bunched photon emission and a doublet structure polarized along [110] and [-110]. We analyze the time-resolved PL decay to clarify the role of the dark states in the spin relaxation and radiative recombination of single fine-structure split excitons.
△ Less
Submitted 16 March, 2005;
originally announced March 2005.
-
Decoherence processes during active manipulation of excitonic qubits in semiconductor quantum dots
Authors:
Q. Q. Wang,
A. Muller,
P. Bianucci,
E. Rossi,
Q. K. Xue,
T. Takagahara,
C. Piermarocchi,
A. H. MacDonald,
C. K. Shih
Abstract:
Using photoluminescence spectroscopy, we have investigated the nature of Rabi oscillation dam** during active manipulation of excitonic qubits in self-assembled quantum dots. Rabi oscillations were recorded by varying the pulse amplitude for fixed pulse durations between 4 ps and 10 ps. Up to 5 periods are visible, making it possible to quantify the excitation dependent dam**. We find that t…
▽ More
Using photoluminescence spectroscopy, we have investigated the nature of Rabi oscillation dam** during active manipulation of excitonic qubits in self-assembled quantum dots. Rabi oscillations were recorded by varying the pulse amplitude for fixed pulse durations between 4 ps and 10 ps. Up to 5 periods are visible, making it possible to quantify the excitation dependent dam**. We find that this dam** is more pronounced for shorter pulse widths and show that its origin is the non-resonant excitation of carriers in the wetting layer, most likely involving bound-to-continuum and continuum-to-bound transitions.
△ Less
Submitted 20 April, 2004;
originally announced April 2004.
-
Determination of anisotropic dipole moments in self-assembled quantum dots using Rabi oscillations
Authors:
A. Muller,
Q. Q. Wang,
P. Bianucci,
Q. K. Xue,
C. K. Shih
Abstract:
By investigating the polarization-dependent Rabi oscillations using photoluminescence spectroscopy, we determined the respective transition dipole moments of the two excited excitonic states |Ex> and |Ey> of a single self-assembled quantum dot that are nondegenerate due to shape anisotropy. We find that the ratio of the two dipole moments is close to the physical elongation ratio of the quantum…
▽ More
By investigating the polarization-dependent Rabi oscillations using photoluminescence spectroscopy, we determined the respective transition dipole moments of the two excited excitonic states |Ex> and |Ey> of a single self-assembled quantum dot that are nondegenerate due to shape anisotropy. We find that the ratio of the two dipole moments is close to the physical elongation ratio of the quantum dot.
△ Less
Submitted 16 April, 2004;
originally announced April 2004.
-
Experimental realization of the one qubit Deutsch-Jozsa algorithm in a quantum dot
Authors:
P. Bianucci,
A. Muller,
C. K. Shih,
Q. Q. Wang,
X. K. Xue,
C. Piermarocchi
Abstract:
We perform quantum interference experiments on a single self-assembled semiconductor quantum dot. The presence or absence of a single exciton in the dot provides a qubit that we control with femtosecond time resolution. We combine a set of quantum operations to realize the single-qubit Deutsch-Jozsa algorithm. The results show the feasibility of single qubit quantum logic in a semiconductor quan…
▽ More
We perform quantum interference experiments on a single self-assembled semiconductor quantum dot. The presence or absence of a single exciton in the dot provides a qubit that we control with femtosecond time resolution. We combine a set of quantum operations to realize the single-qubit Deutsch-Jozsa algorithm. The results show the feasibility of single qubit quantum logic in a semiconductor quantum dot using ultrafast optical control.
△ Less
Submitted 18 March, 2004; v1 submitted 13 January, 2004;
originally announced January 2004.
-
Double-tip STM for Surface Analysis
Authors:
Q. Niu,
M. C. Chang,
C. K. Shih
Abstract:
We explore the possibility of using a double-tip STM to probe the single electron Green function of a sample surface, and describe a few important applications: (1) Probing constant energy surfaces in $\k$-space by ballistic transport; (2) Measuring scattering phase shifts of defects; (3) Observing the transition from ballistic to diffusive transport to localization; and (4) Measuring inelastic…
▽ More
We explore the possibility of using a double-tip STM to probe the single electron Green function of a sample surface, and describe a few important applications: (1) Probing constant energy surfaces in $\k$-space by ballistic transport; (2) Measuring scattering phase shifts of defects; (3) Observing the transition from ballistic to diffusive transport to localization; and (4) Measuring inelastic mean free paths.
△ Less
Submitted 26 December, 1994; v1 submitted 12 May, 1994;
originally announced May 1994.