-
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Subhankar Ghosh,
Jason Li,
Rafael Valle,
Rohan Badlani,
Boris Ginsburg
Abstract:
Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text c…
▽ More
Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Improving Text-To-Audio Models with Synthetic Captions
Authors:
Zhifeng Kong,
Sang-gil Lee,
Deepanway Ghosal,
Navonil Majumder,
Ambuj Mehrish,
Rafael Valle,
Soujanya Poria,
Bryan Catanzaro
Abstract:
It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}…
▽ More
It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Fluorescence Imaging of Individual Ions and Molecules in Pressurized Noble Gases for Barium Tagging in $^{136}$Xe
Authors:
NEXT Collaboration,
N. Byrnes,
E. Dey,
F. W. Foss,
B. J. P. Jones,
R. Madigan,
A. McDonald,
R. L. Miller,
K. E. Navarro,
L. R. Norman,
D. R. Nygren,
C. Adams,
H. Almazán,
V. Álvarez,
B. Aparicio,
A. I. Aranburu,
L. Arazi,
I. J. Arnquist,
F. Auria-Luna,
S. Ayet,
C. D. R. Azevedo,
J. E. Barcelon,
K. Bailey,
F. Ballester,
M. del Barrio-Torregrosa
, et al. (90 additional authors not shown)
Abstract:
The imaging of individual Ba$^{2+}$ ions in high pressure xenon gas is one possible way to attain background-free sensitivity to neutrinoless double beta decay and hence establish the Majorana nature of the neutrino. In this paper we demonstrate selective single Ba$^{2+}$ ion imaging inside a high-pressure xenon gas environment. Ba$^{2+}$ ions chelated with molecular chemosensors are resolved at t…
▽ More
The imaging of individual Ba$^{2+}$ ions in high pressure xenon gas is one possible way to attain background-free sensitivity to neutrinoless double beta decay and hence establish the Majorana nature of the neutrino. In this paper we demonstrate selective single Ba$^{2+}$ ion imaging inside a high-pressure xenon gas environment. Ba$^{2+}$ ions chelated with molecular chemosensors are resolved at the gas-solid interface using a diffraction-limited imaging system with scan area of 1$\times$1~cm$^2$ located inside 10~bar of xenon gas. This new form of microscopy represents an important enabling step in the development of barium tagging for neutrinoless double beta decay searches in $^{136}$Xe, as well as a new tool for studying the photophysics of fluorescent molecules and chemosensors at the solid-gas interface.
△ Less
Submitted 20 May, 2024;
originally announced June 2024.
-
Measurement of Energy Resolution with the NEXT-White Silicon Photomultipliers
Authors:
T. Contreras,
B. Palmeiro,
H. Almazán,
A. Para,
G. Martínez-Lema,
R. Guenette,
C. Adams,
V. Álvarez,
B. Aparicio,
A. I. Aranburu,
L. Arazi,
I. J. Arnquist,
F. Auria-Luna,
S. Ayet,
C. D. R. Azevedo,
K. Bailey,
F. Ballester,
M. del Barrio-Torregrosa,
A. Bayo,
J. M. Benlloch-Rodríguez,
F. I. G. M. Borges,
A. Brodolin,
N. Byrnes,
S. Cárcel,
A. Castillo
, et al. (85 additional authors not shown)
Abstract:
The NEXT-White detector, a high-pressure gaseous xenon time projection chamber, demonstrated the excellence of this technology for future neutrinoless double beta decay searches using photomultiplier tubes (PMTs) to measure energy and silicon photomultipliers (SiPMs) to extract topology information. This analysis uses $^{83m}\text{Kr}$ data from the NEXT-White detector to measure and understand th…
▽ More
The NEXT-White detector, a high-pressure gaseous xenon time projection chamber, demonstrated the excellence of this technology for future neutrinoless double beta decay searches using photomultiplier tubes (PMTs) to measure energy and silicon photomultipliers (SiPMs) to extract topology information. This analysis uses $^{83m}\text{Kr}$ data from the NEXT-White detector to measure and understand the energy resolution that can be obtained with the SiPMs, rather than with PMTs. The energy resolution obtained of (10.9 $\pm$ 0.6) $\%$, full-width half-maximum, is slightly larger than predicted based on the photon statistics resulting from very low light detection coverage of the SiPM plane in the NEXT-White detector. The difference in the predicted and measured resolution is attributed to poor corrections, which are expected to be improved with larger statistics. Furthermore, the noise of the SiPMs is shown to not be a dominant factor in the energy resolution and may be negligible when noise subtraction is applied appropriately, for high-energy events or larger SiPM coverage detectors. These results, which are extrapolated to estimate the response of large coverage SiPM planes, are promising for the development of future, SiPM-only, readout planes that can offer imaging and achieve similar energy resolution to that previously demonstrated with PMTs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Audio Dialogues: Dialogues dataset for audio and music understanding
Authors:
Arushi Goel,
Zhifeng Kong,
Rafael Valle,
Bryan Catanzaro
Abstract:
Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial…
▽ More
Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Authors:
Zhifeng Kong,
Arushi Goel,
Rohan Badlani,
Wei **,
Rafael Valle,
Bryan Catanzaro
Abstract:
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) stro…
▽ More
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo.
△ Less
Submitted 28 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
Authors:
Akshit Arora,
Rohan Badlani,
Sungwon Kim,
Rafael Valle,
Bryan Catanzaro
Abstract:
In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.…
▽ More
In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets. We use HiFi-GAN vocoders for all submissions. RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS) of 3.62.
△ Less
Submitted 29 January, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
On the representation and methodology for wide and short range head pose estimation
Authors:
Alejandro Cobo,
Roberto Valle,
José M. Buenaposada,
Luis Baumela
Abstract:
Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360° rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodo…
▽ More
Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360° rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodology for short- and wide-range HPE and discuss which representations and metrics are adequate for each case. We show that the popular Euler angles representation is a good choice for short-range HPE, but not at extreme rotations. However, the Euler angles' gimbal lock problem prevents them from being used as a valid metric in any setting. We also revisit the current cross-data set evaluation methodology and note that the lack of alignment between the reference systems of the training and test data sets negatively biases the results of all articles in the literature. We introduce a procedure to quantify this misalignment and a new methodology for cross-data set HPE that establishes new, more accurate, SOTA for the 300W-LP|Biwi benchmark. We also propose a generalization of the geodesic angular distance metric that enables the construction of a loss that controls the contribution of each training sample to the optimization of the model. Finally, we introduce a wide range HPE benchmark based on the CMU Panoptic data set.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Authors:
Paarth Neekhara,
Shehzeen Hussain,
Rafael Valle,
Boris Ginsburg,
Rishabh Ranjan,
Shlomo Dubnov,
Farinaz Koushanfar,
Julian McAuley
Abstract:
We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss te…
▽ More
We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss terms can lead to information loss. In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models. First, we develop techniques to derive prosodic information from the audio signal and SSL representations to train predictive submodules in the synthesis model. Next, we propose a training strategy to iteratively improve the synthesis model for voice conversion, by creating a challenging training objective using self-synthesized examples. We demonstrate that incorporating such self-synthesized examples during training improves the speaker similarity of generated speech as compared to a baseline voice conversion model trained solely on heuristically perturbed inputs. Our framework is trained without any text and achieves state-of-the-art results in zero-shot voice conversion on metrics evaluating naturalness, speaker similarity, and intelligibility of synthesized audio.
△ Less
Submitted 3 May, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation
Authors:
Rohan Badlani,
Akshit Arora,
Subhankar Ghosh,
Rafael Valle,
Kevin J. Shih,
João Felipe Santos,
Boris Ginsburg,
Bryan Catanzaro
Abstract:
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Cha…
▽ More
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Multilingual Multiaccented Multispeaker TTS with RADTTS
Authors:
Rohan Badlani,
Rafael Valle,
Kevin J. Shih,
João Felipe Santos,
Siddharth Gururani,
Bryan Catanzaro
Abstract:
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfe…
▽ More
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfer capabilities. To overcome this, we present a multilingual, multiaccented, multispeaker speech synthesis model based on RADTTS with explicit control over accent, language, speaker and fine-grained $F_0$ and energy features. Our proposed model does not rely on bilingual training data. We demonstrate an ability to control synthesized accent for any speaker in an open-source dataset comprising of 7 accents. Human subjective evaluation demonstrates that our model can better retain a speaker's voice and accent quality than controlled baselines while synthesizing fluent speech in all target languages and accents in our dataset.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
SPACE: Speech-driven Portrait Animation with Controllable Expression
Authors:
Siddharth Gururani,
Arun Mallya,
Ting-Chun Wang,
Rafael Valle,
Ming-Yu Liu
Abstract:
Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic h…
▽ More
Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. The project website is available at https://deepimagination.cc/SPACE/
△ Less
Submitted 6 December, 2022; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows
Authors:
Kevin J. Shih,
Rafael Valle,
Rohan Badlani,
João Felipe Santos,
Bryan Catanzaro
Abstract:
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for ha…
▽ More
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for handling the aforementioned issues in the context of Normalizing Flow models. We also find this problem to be very well suited for Neural Spline flows, which is a highly expressive alternative to the more common affine-coupling mechanism in Normalizing Flows.
△ Less
Submitted 27 June, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Multi-task head pose estimation in-the-wild
Authors:
Roberto Valle,
José Miguel Buenaposada,
Luis Baumela
Abstract:
We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the c…
▽ More
We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
One TTS Alignment To Rule Them All
Authors:
Rohan Badlani,
Adrian Łancucki,
Kevin J. Shih,
Rafael Valle,
Wei **,
Bryan Catanzaro
Abstract:
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durati…
▽ More
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Authors:
Rafael Valle,
Kevin Shih,
Ryan Prenger,
Bryan Catanzaro
Abstract:
In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple a…
▽ More
In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible map** of data to a latent space that can be manipulated to control many aspects of speech synthesis (pitch, tone, speech rate, cadence, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. In addition, we provide results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training. Code and pre-trained models will be made publicly available at https://github.com/NVIDIA/flowtron
△ Less
Submitted 16 July, 2020; v1 submitted 12 May, 2020;
originally announced May 2020.
-
Neural ODEs for Image Segmentation with Level Sets
Authors:
Rafael Valle,
Fitsum Reda,
Mohammad Shoeybi,
Patrick Legresley,
Andrew Tao,
Bryan Catanzaro
Abstract:
We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or…
▽ More
We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or design of contour embedding functions, we propose a NODE-based method that evolves an image embedding into a dense per-pixel semantic label space. We evaluate our methods on kidney segmentation (KiTS19) and on salient object detection (PASCAL-S, ECSSD and HKU-IS). In addition to improving initial contours provided by deep learning models while using a fraction of their number of parameters, our approach achieves F scores that are higher than several state-of-the-art deep learning algorithms.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Authors:
Rafael Valle,
Jason Li,
Ryan Prenger,
Bryan Catanzaro
Abstract:
Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from mon…
▽ More
Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice. Unlike other methods, we train Mellotron using only read speech data without alignments between text and audio. We evaluate our models using the LJSpeech and LibriTTS datasets. We provide F0 Frame Errors and synthesized samples that include style transfer from other speakers, singers and styles not seen during training, procedural manipulation of rhythm and pitch and choir synthesis.
△ Less
Submitted 26 October, 2019;
originally announced October 2019.
-
Face Alignment using a 3D Deeply-initialized Ensemble of Regression Trees
Authors:
Roberto Valle,
José M. Buenaposada,
Antonio Valdés,
Luis Baumela
Abstract:
Face alignment algorithms locate a set of landmark points in images of faces taken in unrestricted situations. State-of-the-art approaches typically fail or lose accuracy in the presence of occlusions, strong deformations, large pose variations and ambiguous configurations. In this paper we present 3DDE, a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles…
▽ More
Face alignment algorithms locate a set of landmark points in images of faces taken in unrestricted situations. State-of-the-art approaches typically fail or lose accuracy in the presence of occlusions, strong deformations, large pose variations and ambiguous configurations. In this paper we present 3DDE, a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles of regression trees. It is initialized by robustly fitting a 3D face model to the probability maps produced by a convolutional neural network. With this initialization we address self-occlusions and large face rotations. Further, the regressor implicitly imposes a prior face shape on the solution, addressing occlusions and ambiguous face configurations. Its coarse-to-fine structure tackles the combinatorial explosion of parts deformation. In the experiments performed, 3DDE improves the state-of-the-art in 300W, COFW, AFLW and WFLW data sets. Finally, we perform cross-dataset experiments that reveal the existence of a significant data set bias in these benchmarks.
△ Less
Submitted 13 December, 2019; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Non-Gaussian Geostatistical Modeling using (skew) t Processes
Authors:
M. Bevilacqua,
C. Caamaño,
R. B. Arellano Valle,
V. Morales-Onñate
Abstract:
We propose a new model for regression and dependence analysis when addressing spatial data with possibly heavy tails and an asymmetric marginal distribution. We first propose a stationary process with $t$ marginals obtained through scale mixing of a Gaussian process with an inverse square root process with Gamma marginals. We then generalize this construction by considering a skew-Gaussian process…
▽ More
We propose a new model for regression and dependence analysis when addressing spatial data with possibly heavy tails and an asymmetric marginal distribution. We first propose a stationary process with $t$ marginals obtained through scale mixing of a Gaussian process with an inverse square root process with Gamma marginals. We then generalize this construction by considering a skew-Gaussian process, thus obtaining a process with skew-t marginal distributions. For the proposed (skew) $t$ process we study the second-order and geometrical properties and in the $t$ case, we provide analytic expressions for the bivariate distribution. In an extensive simulation study, we investigate the use of the weighted pairwise likelihood as a method of estimation for the $t$ process. Moreover we compare the performance of the optimal linear predictor of the $t$ process versus the optimal Gaussian predictor. Finally, the effectiveness of our methodology is illustrated by analyzing a georeferenced dataset on maximum temperatures in Australia
△ Less
Submitted 19 December, 2019; v1 submitted 15 December, 2018;
originally announced December 2018.
-
WaveGlow: A Flow-based Generative Network for Speech Synthesis
Authors:
Ryan Prenger,
Rafael Valle,
Bryan Catanzaro
Abstract:
In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood…
▽ More
In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online.
△ Less
Submitted 30 October, 2018;
originally announced November 2018.
-
Theoretical approach to the ductile fracture of polycrystalline solids
Authors:
Miguel Lagos,
César Retamal,
Rodrigo Valle
Abstract:
It is shown here that fracture after a brief plastic strain, typically of a few percents, is a necessary consequence of the polycrystalline nature of the materials. The polycrystal undergoing plastic deformation is modeled as a flowing continuum of random deformable polyhedra, representing the grains, which fill the space without leaving voids. Adjacent grains slide with a relative velocity propor…
▽ More
It is shown here that fracture after a brief plastic strain, typically of a few percents, is a necessary consequence of the polycrystalline nature of the materials. The polycrystal undergoing plastic deformation is modeled as a flowing continuum of random deformable polyhedra, representing the grains, which fill the space without leaving voids. Adjacent grains slide with a relative velocity proportional to the local shear stress resolved on the plane of the shared grain boundary, when greater than a finite threshold. The polyhedral grains reshape continuously to preserve matter continuity, being the forces causing grain sliding dominant over those resha** the grains. It has been shown in the past that this model does not conserve volume, causing a monotonic hydrostatic pressure variation with strain. This effect introduces a novel concept in the theory of plasticity because determines that any fine grained polycrystalline material will fail after a finite plastic strain. Here the hydrostatic pressure dependence on strain is explicitly calculated and shown that has a logarithmic divergence which determines the strain to fracture. Comparison of theoretical results with strains to fracture given by mechanical tests of commercial alloys show very good agreement.
△ Less
Submitted 26 August, 2018;
originally announced August 2018.
-
Visual Display and Retrieval of Music Information
Authors:
Rafael Valle
Abstract:
This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased.
This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
TequilaGAN: How to easily identify GAN samples
Authors:
Rafael Valle,
Wilson Cai,
Anish Doshi
Abstract:
In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fa…
▽ More
In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Attacking Speaker Recognition With Deep Generative Models
Authors:
Wilson Cai,
Anish Doshi,
Rafael Valle
Abstract:
In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class…
▽ More
In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems.
△ Less
Submitted 8 January, 2018;
originally announced January 2018.
-
Character-Based Handwritten Text Transcription with Attention Networks
Authors:
Jason Poulos,
Rafael Valle
Abstract:
The paper approaches the task of handwritten text recognition (HTR) with attentional encoder-decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavil…
▽ More
The paper approaches the task of handwritten text recognition (HTR) with attentional encoder-decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a precise alignment between the source and target. Future research may explore HTR in natural scene images, since the model is capable of transcribing handwritten text without the need for producing segmentations or bounding boxes of text in images.
△ Less
Submitted 24 February, 2021; v1 submitted 11 December, 2017;
originally announced December 2017.
-
Missing Data Imputation for Supervised Learning
Authors:
Jason Poulos,
Rafael Valle
Abstract:
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or i…
▽ More
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.
△ Less
Submitted 6 August, 2018; v1 submitted 28 October, 2016;
originally announced October 2016.
-
ABROA : Audio-Based Room-Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models
Authors:
Rafael Valle
Abstract:
This paper outlines preliminary steps towards the development of an audio- based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide res…
▽ More
This paper outlines preliminary steps towards the development of an audio- based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide results obtained from experiments with audio data from a retail store in Palo Alto, California. Model assessment is done via leave-two-out Bootstrap and model convergence achieves good accuracy, thus representing a contribution to multimodal people counting algorithms.
△ Less
Submitted 22 June, 2016;
originally announced July 2016.
-
Control Improvisation with Probabilistic Temporal Specifications
Authors:
Ilge Akkaya,
Daniel J. Fremont,
Rafael Valle,
Alexandre Donzé,
Edward A. Lee,
Sanjit A. Seshia
Abstract:
We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Mar…
▽ More
We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Markov model, or EDHMM) and then supervise this model in order to guarantee that the generated sequences satisfy some desirable specifications given in Probabilistic Computation Tree Logic (PCTL). We present an implementation of our approach and apply it to the problem of mimicking the use of lighting appliances in a residential unit, with potential applications to home security and resource management. We present experimental results showing that our approach produces realistic control sequences, similar to recorded data based on human actuation, while satisfying suitable formal requirements.
△ Less
Submitted 29 February, 2016; v1 submitted 6 November, 2015;
originally announced November 2015.
-
New constraint on the existence of the mu+-> e+ gamma decay
Authors:
MEG Collaboration,
J. Adam,
X. Bai,
A. M. Baldini,
E. Baracchini,
C. Bemporad,
G. Boca,
P. W. Cattaneo,
G. Cavoto,
F. Cei,
C. Cerri,
A. de Bari,
M. De Gerone,
T. Doke,
S. Dussoni,
J. Egger,
K. Fratini,
Y. Fujii,
L. Galli,
G. Gallucci,
F. Gatti,
B. Golden,
M. Grassi,
A. Graziosi,
D. N. Grigoriev
, et al. (49 additional authors not shown)
Abstract:
The analysis of a combined data set, totaling 3.6 \times 10^14 stopped muons on target, in the search for the lepton flavour violating decay mu^+ -> e^+ gamma is presented. The data collected by the MEG experiment at the Paul Scherrer Institut show no excess of events compared to background expectations and yield a new upper limit on the branching ratio of this decay of 5.7 \times 10^-13 (90% conf…
▽ More
The analysis of a combined data set, totaling 3.6 \times 10^14 stopped muons on target, in the search for the lepton flavour violating decay mu^+ -> e^+ gamma is presented. The data collected by the MEG experiment at the Paul Scherrer Institut show no excess of events compared to background expectations and yield a new upper limit on the branching ratio of this decay of 5.7 \times 10^-13 (90% confidence level). This represents a four times more stringent limit than the previous world best limit set by MEG.
△ Less
Submitted 23 April, 2013; v1 submitted 4 March, 2013;
originally announced March 2013.
-
Development and commissioning of the Timing Counter for the MEG Experiment
Authors:
M. De Gerone,
S. Dussoni,
K. Fratini,
F. Gatti,
R. Valle,
G. Boca,
P. W. Cattaneo,
R. Nardò,
M. Rossella,
L. Galli,
M. Grassi,
D. Nicolò,
Y. Uchiyama,
D. Zanello
Abstract:
The Timing Counter of the MEG (Mu to Electron Gamma) experiment is designed to deliver trigger information and to accurately measure the timing of the $e^+$ in searching for the decay $μ^+ \rightarrow e^+γ$. It is part of a magnetic spectrometer with the $μ^+$ decay target in the center. It consists of two sectors upstream and downstream the target, each one with two layers: the inner one made wit…
▽ More
The Timing Counter of the MEG (Mu to Electron Gamma) experiment is designed to deliver trigger information and to accurately measure the timing of the $e^+$ in searching for the decay $μ^+ \rightarrow e^+γ$. It is part of a magnetic spectrometer with the $μ^+$ decay target in the center. It consists of two sectors upstream and downstream the target, each one with two layers: the inner one made with scintillating fibers read out by APDs for trigger and track reconstruction, the outer one consisting in scintillating bars read out by PMTs for trigger and time measurement. The design criteria, the obtained performances and the commissioning of the detector are presented herein.
△ Less
Submitted 4 February, 2012; v1 submitted 1 December, 2011;
originally announced December 2011.
-
A pulse fishery model with closures as function of the catch: Conditions for sustainability
Authors:
Fernando Córdova-Lepe,
Rodrigo del Valle,
Gonzalo Robledo
Abstract:
We present a model of single species fishery which alternates closed seasons with pulse captures. The novelty is that the length of a closed season is determined by the stock size of the last capture. The process is described by a new type of impulsive differential equations recently introduced. The main result is a fishing effort threshold which determines either the sustainability of the fishery…
▽ More
We present a model of single species fishery which alternates closed seasons with pulse captures. The novelty is that the length of a closed season is determined by the stock size of the last capture. The process is described by a new type of impulsive differential equations recently introduced. The main result is a fishing effort threshold which determines either the sustainability of the fishery or the extinction of the resource.
△ Less
Submitted 13 October, 2011;
originally announced October 2011.
-
New limit on the lepton-flavour violating decay mu -> e gamma
Authors:
MEG collaboration,
J. Adam,
X. Bai,
A. M. Baldini,
E. Baracchini,
C. Bemporad,
G. Boca,
P. W. Cattaneo,
G. Cavoto,
F. Cei,
C. Cerri,
A. de Bari,
M. De Gerone,
T. Doke,
S. Dussoni,
J. Egger,
K. Fratini,
Y. Fujii,
L. Galli,
G. Gallucci,
F. Gatti,
B. Golden,
M. Grassi,
D. N. Grigoriev,
T. Haruyama
, et al. (42 additional authors not shown)
Abstract:
We present a new result based on an analysis of the data collected by the MEG detector at the Paul Scherrer Institut in 2009 and 2010, in search of the lepton flavour violating decay mu->e gamma. The likelihood analysis of the combined data sample, which corresponds to a total of 1.8 x 10**14 muon decays, gives a 90% C.L. upper limit of 2.4 x 10**-12 on the branching ratio of the mu->e gamma decay…
▽ More
We present a new result based on an analysis of the data collected by the MEG detector at the Paul Scherrer Institut in 2009 and 2010, in search of the lepton flavour violating decay mu->e gamma. The likelihood analysis of the combined data sample, which corresponds to a total of 1.8 x 10**14 muon decays, gives a 90% C.L. upper limit of 2.4 x 10**-12 on the branching ratio of the mu->e gamma decay, constituting the most stringent limit on the existence of this decay to date.
△ Less
Submitted 2 September, 2011; v1 submitted 27 July, 2011;
originally announced July 2011.
-
The Timing Counter of the MEG experiment: calibration and performance
Authors:
P. W. Cattaneo,
M. De Gerone,
S. Dussoni,
F. Gatti,
M. Rossella,
Y. Uchiyama,
R. Valle
Abstract:
The MEG detector is designed to test Lepton Flavor Violation in the $μ^+\rightarrow e^+γ$ decay down to a Branching Ratio of a few $10^{-13}$. The decay topology consists in the coincident emission of a monochromatic photon in direction opposite to a monochromatic positron. A precise measurement of the relative time $t_{e^+γ}$ is crucial to suppress the background. The Timing Counter (TC) is desig…
▽ More
The MEG detector is designed to test Lepton Flavor Violation in the $μ^+\rightarrow e^+γ$ decay down to a Branching Ratio of a few $10^{-13}$. The decay topology consists in the coincident emission of a monochromatic photon in direction opposite to a monochromatic positron. A precise measurement of the relative time $t_{e^+γ}$ is crucial to suppress the background. The Timing Counter (TC) is designed to precisely measure the time of arrival of the $e^+$ and to provide information to the trigger system. It consists of two sectors up and down stream the decay target, each consisting of two layers. The outer one made of scintillating bars and the inner one of scintillating fibers. Their design criteria and performances are described.
△ Less
Submitted 6 April, 2011;
originally announced April 2011.
-
A limit for the mu -> e gamma decay from the MEG experiment
Authors:
MEG collaboration,
J. Adam,
X. Bai,
A. Baldini,
E. Baracchini,
A. Barchiesi,
C. Bemporad,
G. Boca,
P. W. Cattaneo,
G. Cavoto,
G. Cecchet,
F. Cei,
C. Cerri,
A. De Bari,
M. De Gerone,
T. Doke,
S. Dussoni,
J. Egger,
L. Galli,
G. Gallucci,
F. Gatti,
B. Golden,
M. Grassi,
D. N. Grigoriev,
T. Haruyama
, et al. (45 additional authors not shown)
Abstract:
A search for the decay mu -> e gamma, performed at PSI and based on data from the initial three months of operation of the MEG experiment, yields an upper limit on the branching ratio of BR(mu -> e gamma) < 2.8 x 10**-11 (90% C.L.). This corresponds to the measurement of positrons and photons from ~ 10**14 stopped mu-decays by means of a superconducting positron spectrometer and a 900 litre liqu…
▽ More
A search for the decay mu -> e gamma, performed at PSI and based on data from the initial three months of operation of the MEG experiment, yields an upper limit on the branching ratio of BR(mu -> e gamma) < 2.8 x 10**-11 (90% C.L.). This corresponds to the measurement of positrons and photons from ~ 10**14 stopped mu-decays by means of a superconducting positron spectrometer and a 900 litre liquid xenon photon detector.
△ Less
Submitted 4 March, 2010; v1 submitted 18 August, 2009;
originally announced August 2009.
-
Synthesis, crystal structure, microstructure, transport and magnetic properties of SmFeAsO and SmFeAs(O0.93F0.07)
Authors:
A. Martinelli,
M. Ferretti,
P. Manfrinetti,
A. Palenzona,
M. Tropeano,
M. R. Cimberle,
C. Ferdeghini,
R. Valle,
M. Putti,
A. S. Siri
Abstract:
SmFeAsO and the isostructural superconducting SmFeAs(O0.93F0.07) samples were prepared. Characterization by means of Rietveld refinement of X-ray powder diffraction data, scanning electron microscope observation, transmission electron microscope analysis, resistivity and magnetization measurements were carried out. Sintering treatment strongly improves the grain connectivity, but, on the other h…
▽ More
SmFeAsO and the isostructural superconducting SmFeAs(O0.93F0.07) samples were prepared. Characterization by means of Rietveld refinement of X-ray powder diffraction data, scanning electron microscope observation, transmission electron microscope analysis, resistivity and magnetization measurements were carried out. Sintering treatment strongly improves the grain connectivity, but, on the other hand, induces a competition between the thermodynamic stability of the oxy-pnictide and Sm2O3, hence worsening the purity of the sample. In the pristine sample both magnetization and resistivity measurements clearly indicate that two different sources of magnetism are present: the former related to Fe ordering at 140 K and the latter due to the Sm ions that orders antiferromagnetically at low temperature. The feature at 140 K disappears in the F-substituted sample and, at low temperatures a superconducting transition appears. The magnetoresistivity curves of the F-substituted sample probably indicates very high critical field values.
△ Less
Submitted 13 June, 2008;
originally announced June 2008.
-
Direct evidence of overdamped Peierls-coupled modes in TTF-CA temperature-induced phase transition
Authors:
A. Girlando,
M. Masino,
A. Painelli,
N. Drichko,
M. Dressel,
A. Brillante,
R. G. Della Valle,
E. Venuti
Abstract:
In this paper we elucidate the optical response resulting from the interplay of charge distribution (ionicity) and Peierls instability (dimerization) in the neutral-ionic, ferroelectric phase transition of tetrathiafulvalene-chloranil (TTF-CA), a mixed-stack quasi-one-dimensional charge-transfer crystal. We present far-infrared reflectivity measurements down to 5 cm-1 as a function of temperatur…
▽ More
In this paper we elucidate the optical response resulting from the interplay of charge distribution (ionicity) and Peierls instability (dimerization) in the neutral-ionic, ferroelectric phase transition of tetrathiafulvalene-chloranil (TTF-CA), a mixed-stack quasi-one-dimensional charge-transfer crystal. We present far-infrared reflectivity measurements down to 5 cm-1 as a function of temperature above the phase transition (300 - 82 K). The coupling between electrons and lattice phonons in the pre-transitional regime is analyzed on the basis of phonon eigenvectors and polarizability calculations of the one-dimensional Peierls-Hubbard model. We find a multi-phonon Peierls coupling, but on approaching the transition the spectral weight and the coupling shift progressively towards the phonons at lower frequencies, resulting in a soft-mode behavior only for the lowest frequency phonon near the transition temperature. Moreover, in the proximity of the phase transition, the lowest-frequency phonon becomes overdamped, due to anharmonicity induced by its coupling to electrons. The implications of these findings for the neutral-ionic transition mechanism is shortly discussed.
△ Less
Submitted 19 December, 2007;
originally announced December 2007.
-
Phonons and structures of tetracene polymorphs at low temperature and high pressure
Authors:
Elisabetta Venuti,
Raffaele Guido Della Valle,
Luca Farina,
Aldo Brillante,
Matteo Masino,
Alberto Girlando
Abstract:
Crystals of tetracene have been studied by means of lattice phonon Raman spectroscopy as a function of temperature and pressure. Two different phases (polymorphs I and II) have been obtained, depending on sample preparation and history. Polymorph I is the most frequently grown phase, stable at ambient conditions. A pressure induced phase transition, observed above 1 GPa, leads to polymorph II, w…
▽ More
Crystals of tetracene have been studied by means of lattice phonon Raman spectroscopy as a function of temperature and pressure. Two different phases (polymorphs I and II) have been obtained, depending on sample preparation and history. Polymorph I is the most frequently grown phase, stable at ambient conditions. A pressure induced phase transition, observed above 1 GPa, leads to polymorph II, which is also obtained at temperatures below 140 K. Polymorph II can also be maintained at ambient conditions.
We have calculated the crystallographic structures and phonon frequencies as a function of temperature, starting from the configurations of the energy minima found by exploring the potential energy surface of crystalline tetracene. The spectra calculated for the first and second deepest minima match satisfactorily those measured for polymorphs I and II, respectively. All published x-ray structures, once assigned to the appropriate polymorph, are also reproduced.
△ Less
Submitted 14 April, 2004; v1 submitted 11 December, 2003;
originally announced December 2003.
-
Polymorphism, phonon dynamics and carrier-phonon coupling in pentacene
Authors:
Raffaele G. Della Valle,
Aldo Brillante,
Luca Farina,
Elisabetta Venuti,
Matteo Masino,
Alberto Girlando
Abstract:
The crystal structure and phonon dynamics of pentacene is computed with the Quasi Harmonic Lattice Dynamics (QHLD) method, based on atom-atom potential. We show that two crystalline phases of pentacene exist, rather similar in thermodynamic stability and in molecular density. The two phases can be easily distinguished by Raman spectroscopy in the 10-100 cm-1 spectral region. We have not found an…
▽ More
The crystal structure and phonon dynamics of pentacene is computed with the Quasi Harmonic Lattice Dynamics (QHLD) method, based on atom-atom potential. We show that two crystalline phases of pentacene exist, rather similar in thermodynamic stability and in molecular density. The two phases can be easily distinguished by Raman spectroscopy in the 10-100 cm-1 spectral region. We have not found any temperature induced phase transition, whereas a sluggish phase change to the denser phase is induced by pressure. The bandwidths of the two phases are slightly different. The charge carrier coupling to low-frequency phonons is calculated.
△ Less
Submitted 4 July, 2003;
originally announced July 2003.
-
BEDT-TTF organic superconductors: the entangled role of phonons
Authors:
Alberto Girlando,
Matteo Masino,
Aldo Brillante,
Raffaele G. Della Valle,
Elisabetta Venuti
Abstract:
We calculate the lattice phonons and the electron-phonon coupling of the organic superconductor κ-(BEDT-TTF)_2 I_3, reproducing all available experimental data connected to phonon dynamics. Low-frequency intra-molecular vibrations are strongly mixed to lattice phonons. Both acoustic and optical phonons are appreciably coupled to electrons through the modulation of the hop** integrals (e-LP cou…
▽ More
We calculate the lattice phonons and the electron-phonon coupling of the organic superconductor κ-(BEDT-TTF)_2 I_3, reproducing all available experimental data connected to phonon dynamics. Low-frequency intra-molecular vibrations are strongly mixed to lattice phonons. Both acoustic and optical phonons are appreciably coupled to electrons through the modulation of the hop** integrals (e-LP coupling). By comparing the results relevant to superconducting κ- and β-(BEDT-TTF)_2 I_3, we show that electron-phonon coupling is fundamental to the pairing mechanism. Both e-LP and electron-molecular vibration (e-MV) coupling are essential to reproduce the critical temperatures. The e-LP coupling is stronger, but e-MV is instrumental to increase the average phonon frequency.
△ Less
Submitted 16 October, 2002; v1 submitted 8 February, 2002;
originally announced February 2002.
-
Lattice dynamics and electron-phonon coupling in β-(BEDT-TTF)_2I_3 organic superconductor
Authors:
A. Girlando,
M. Masino,
G. Visentini,
R. G. Della Valle,
A. Brillante,
E. Venuti
Abstract:
The crystal structure and lattice phonons of (BEDT-TTF)_2I_3 superconducting β-phase are computed and analyzed by the Quasi Harmonic Lattice Dynamics (QHLD) method. Whereas the crystal structure and its temperature and pressure dependence are properly reproduced within a rigid molecule approximation, this has to be removed to account for the specific heat data. Such a mixing between lattice and…
▽ More
The crystal structure and lattice phonons of (BEDT-TTF)_2I_3 superconducting β-phase are computed and analyzed by the Quasi Harmonic Lattice Dynamics (QHLD) method. Whereas the crystal structure and its temperature and pressure dependence are properly reproduced within a rigid molecule approximation, this has to be removed to account for the specific heat data. Such a mixing between lattice and low-frequency intramolecular vibrations also yields good agreement with the observed Raman and infrared frequencies. From the eigenvectors of the low-frequency phonons we calculate the electron-phonon coupling constants due to the modulation of charge transfer (hop**) integrals. The hop** integrals are evaluated by the extended Hueckel method applied to all nearest-neighbor BEDT-TTF pairs in the ab crystal plane. From the averaged electron-phonon coupling constants and the QHLD phonon density of states we derive the Eliashberg coupling function, which compares well with that experimentally obtained from point-contact spectroscopy. The corresponding dimensionless coupling constant λis found to be around 0.4 .
△ Less
Submitted 24 March, 2000; v1 submitted 17 March, 2000;
originally announced March 2000.
-
A scaling approximation for structure factors in the integral equation theory of polydisperse nonionic colloidal fluids
Authors:
Domenico Gazzillo,
Achille Giacometti,
Raffaele G. Della Valle,
Elisabetta Venuti,
Flavio Carsughi
Abstract:
Integral equation of pure liquids, combined with a new "scaling approximation" based on a corresponding states treatment of pair correlation functions, is used to evaluate approximate structure factors for colloidal fluids constituted of uncharged particles with polydispersity in size and energy parameters. Both hard spheres and Lennard-Jones interactions are considered. For polydisperse hard sp…
▽ More
Integral equation of pure liquids, combined with a new "scaling approximation" based on a corresponding states treatment of pair correlation functions, is used to evaluate approximate structure factors for colloidal fluids constituted of uncharged particles with polydispersity in size and energy parameters. Both hard spheres and Lennard-Jones interactions are considered. For polydisperse hard spheres, the scaling approximation is compared to theories utilized by small angle scattering experimentalists (decoupling approximation, local monodisperse approximation)and to the van der Waals one-fluid theory. The results are tested against predictions from analytical expressions, exact within the Percus-Yevick approximation. For polydisperse Lennard-Jones particles, the scaling approximation combined with a "modified hypernetted chain" integral equation, is tested against molecular dynamics data generated for the present work. Despite ist simplicity, the scaling approximation exhibits a satisfactory performance for both potentials and represents a considerable improvement over the above mentioned theories. Shortcomings of the proposed theory, its applicability to the analysis of experimental scattering data, and its possible extensions to different potentials are finally discussed.
△ Less
Submitted 15 October, 1999; v1 submitted 23 July, 1999;
originally announced July 1999.
-
Towards an effective potential for the monomer, dimer, hexamer, solid and liquid forms of hydrogen fluoride
Authors:
Raffaele Guido Della Valle,
Domenico Gazzillo
Abstract:
We present an attempt to build up a new two-body effective potential for hydrogen fluoride, fitted to theoretical and experimental data relevant not only to the gas and liquid phases, but also to the crystal. The model is simple enough to be used in Molecular Dynamics and Monte Carlo simulations. The potential consists of: a) an intra-molecular contribution, allowing for variations of the molecu…
▽ More
We present an attempt to build up a new two-body effective potential for hydrogen fluoride, fitted to theoretical and experimental data relevant not only to the gas and liquid phases, but also to the crystal. The model is simple enough to be used in Molecular Dynamics and Monte Carlo simulations. The potential consists of: a) an intra-molecular contribution, allowing for variations of the molecular length, plus b) an inter-molecular part, with three charged sites on each monomer and a Buckingham "exp-6" interaction between fluorines. The model is able to reproduce a significant number of observables on the monomer, dimer, hexamer, solid and liquid forms of HF. The shortcomings of the model are pointed out and possible improvements are finally discussed.
△ Less
Submitted 6 March, 1999; v1 submitted 3 March, 1999;
originally announced March 1999.
-
Quasi Harmonic Lattice Dynamics and Molecular Dynamics calculations for the Lennard-Jones solids
Authors:
Raffaele Guido Della Valle,
Elisabetta Venuti
Abstract:
We present Molecular Dynamics (MD), Quasi Harmonic Lattice Dynamics (QHLD) and Energy Minimization (EM) calculations for the crystal structure of Ne, Ar, Kr and Xe as a function of pressure and temperature. New Lennard-Jones (LJ) parameters are obtained for Ne, Kr and Xe to reproduce the experimental pressure dependence of the density. We employ a simple method which combines results of QHLD and…
▽ More
We present Molecular Dynamics (MD), Quasi Harmonic Lattice Dynamics (QHLD) and Energy Minimization (EM) calculations for the crystal structure of Ne, Ar, Kr and Xe as a function of pressure and temperature. New Lennard-Jones (LJ) parameters are obtained for Ne, Kr and Xe to reproduce the experimental pressure dependence of the density. We employ a simple method which combines results of QHLD and MD calculations to achieve densities in good agreement with experiment from 0 K to melting. Melting is discussed in connection with intrinsic instability of the solid as given by the QHLD approximation. (See http://www.fci.unibo.it/~valle for related papers)
△ Less
Submitted 5 May, 1998;
originally announced May 1998.