Search | arXiv e-print repository

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Authors: Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg

Abstract: Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text c… ▽ More Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Published as a conference paper at INTERSPEECH 2024

arXiv:2406.15487 [pdf, other]

Improving Text-To-Audio Models with Synthetic Captions

Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.15422 [pdf, other]

Fluorescence Imaging of Individual Ions and Molecules in Pressurized Noble Gases for Barium Tagging in $^{136}$Xe

Authors: NEXT Collaboration, N. Byrnes, E. Dey, F. W. Foss, B. J. P. Jones, R. Madigan, A. McDonald, R. L. Miller, K. E. Navarro, L. R. Norman, D. R. Nygren, C. Adams, H. Almazán, V. Álvarez, B. Aparicio, A. I. Aranburu, L. Arazi, I. J. Arnquist, F. Auria-Luna, S. Ayet, C. D. R. Azevedo, J. E. Barcelon, K. Bailey, F. Ballester, M. del Barrio-Torregrosa , et al. (90 additional authors not shown)

Abstract: The imaging of individual Ba$^{2+}$ ions in high pressure xenon gas is one possible way to attain background-free sensitivity to neutrinoless double beta decay and hence establish the Majorana nature of the neutrino. In this paper we demonstrate selective single Ba$^{2+}$ ion imaging inside a high-pressure xenon gas environment. Ba$^{2+}$ ions chelated with molecular chemosensors are resolved at t… ▽ More The imaging of individual Ba$^{2+}$ ions in high pressure xenon gas is one possible way to attain background-free sensitivity to neutrinoless double beta decay and hence establish the Majorana nature of the neutrino. In this paper we demonstrate selective single Ba$^{2+}$ ion imaging inside a high-pressure xenon gas environment. Ba$^{2+}$ ions chelated with molecular chemosensors are resolved at the gas-solid interface using a diffraction-limited imaging system with scan area of 1$\times$1~cm$^2$ located inside 10~bar of xenon gas. This new form of microscopy represents an important enabling step in the development of barium tagging for neutrinoless double beta decay searches in $^{136}$Xe, as well as a new tool for studying the photophysics of fluorescent molecules and chemosensors at the solid-gas interface. △ Less

Submitted 20 May, 2024; originally announced June 2024.

arXiv:2405.20427 [pdf, other]

Measurement of Energy Resolution with the NEXT-White Silicon Photomultipliers

Authors: T. Contreras, B. Palmeiro, H. Almazán, A. Para, G. Martínez-Lema, R. Guenette, C. Adams, V. Álvarez, B. Aparicio, A. I. Aranburu, L. Arazi, I. J. Arnquist, F. Auria-Luna, S. Ayet, C. D. R. Azevedo, K. Bailey, F. Ballester, M. del Barrio-Torregrosa, A. Bayo, J. M. Benlloch-Rodríguez, F. I. G. M. Borges, A. Brodolin, N. Byrnes, S. Cárcel, A. Castillo , et al. (85 additional authors not shown)

Abstract: The NEXT-White detector, a high-pressure gaseous xenon time projection chamber, demonstrated the excellence of this technology for future neutrinoless double beta decay searches using photomultiplier tubes (PMTs) to measure energy and silicon photomultipliers (SiPMs) to extract topology information. This analysis uses $^{83m}\text{Kr}$ data from the NEXT-White detector to measure and understand th… ▽ More The NEXT-White detector, a high-pressure gaseous xenon time projection chamber, demonstrated the excellence of this technology for future neutrinoless double beta decay searches using photomultiplier tubes (PMTs) to measure energy and silicon photomultipliers (SiPMs) to extract topology information. This analysis uses $^{83m}\text{Kr}$ data from the NEXT-White detector to measure and understand the energy resolution that can be obtained with the SiPMs, rather than with PMTs. The energy resolution obtained of (10.9 $\pm$ 0.6) $\%$, full-width half-maximum, is slightly larger than predicted based on the photon statistics resulting from very low light detection coverage of the SiPM plane in the NEXT-White detector. The difference in the predicted and measured resolution is attributed to poor corrections, which are expected to be improved with larger statistics. Furthermore, the noise of the SiPMs is shown to not be a dominant factor in the energy resolution and may be negligible when noise subtraction is applied appropriately, for high-energy events or larger SiPM coverage detectors. These results, which are extrapolated to estimate the response of large coverage SiPM planes, are promising for the development of future, SiPM-only, readout planes that can offer imaging and achieve similar energy resolution to that previously demonstrated with PMTs. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.07616 [pdf, other]

Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Abstract: Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial… ▽ More Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Demo website: https://audiodialogues.github.io/

arXiv:2402.01831 [pdf, other]

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Authors: Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei **, Rafael Valle, Bryan Catanzaro

Abstract: Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) stro… ▽ More Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo. △ Less

Submitted 28 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2401.13851 [pdf, ps, other]

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

Authors: Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

Abstract: In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.… ▽ More In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets. We use HiFi-GAN vocoders for all submissions. RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS) of 3.62. △ Less

Submitted 29 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Presentation accepted at ICASSP 2024

arXiv:2401.05807 [pdf, other]

doi 10.1016/j.patcog.2024.110263

On the representation and methodology for wide and short range head pose estimation

Authors: Alejandro Cobo, Roberto Valle, José M. Buenaposada, Luis Baumela

Abstract: Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360° rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodo… ▽ More Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360° rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodology for short- and wide-range HPE and discuss which representations and metrics are adequate for each case. We show that the popular Euler angles representation is a good choice for short-range HPE, but not at extreme rotations. However, the Euler angles' gimbal lock problem prevents them from being used as a valid metric in any setting. We also revisit the current cross-data set evaluation methodology and note that the lack of alignment between the reference systems of the training and test data sets negatively biases the results of all articles in the literature. We introduce a procedure to quantify this misalignment and a new methodology for cross-data set HPE that establishes new, more accurate, SOTA for the 300W-LP|Biwi benchmark. We also propose a generalization of the geodesic angular distance metric that enables the construction of a loss that controls the contribution of each training sample to the optimization of the model. Finally, we introduce a wide range HPE benchmark based on the CMU Panoptic data set. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2310.09653 [pdf, other]

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Authors: Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

Abstract: We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss te… ▽ More We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss terms can lead to information loss. In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models. First, we develop techniques to derive prosodic information from the audio signal and SSL representations to train predictive submodules in the synthesis model. Next, we propose a training strategy to iteratively improve the synthesis model for voice conversion, by creating a challenging training objective using self-synthesized examples. We demonstrate that incorporating such self-synthesized examples during training improves the speaker similarity of generated speech as compared to a baseline voice conversion model trained solely on heuristically perturbed inputs. Our framework is trained without any text and achieves state-of-the-art results in zero-shot voice conversion on metrics evaluating naturalness, speaker similarity, and intelligibility of synthesized audio. △ Less

Submitted 3 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: Accepted at ICML 2024

arXiv:2303.07578 [pdf, ps, other]

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

Authors: Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

Abstract: We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Cha… ▽ More We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Presentation accepted at ICASSP 2023

arXiv:2301.10335 [pdf, other]

Multilingual Multiaccented Multispeaker TTS with RADTTS

Authors: Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

Abstract: We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfe… ▽ More We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice. This is challenging to do because it is expensive to obtain bilingual training data in multiple languages, and the lack of such data results in strong correlations that entangle speaker, language, and accent, resulting in poor transfer capabilities. To overcome this, we present a multilingual, multiaccented, multispeaker speech synthesis model based on RADTTS with explicit control over accent, language, speaker and fine-grained $F_0$ and energy features. Our proposed model does not rely on bilingual training data. We demonstrate an ability to control synthesized accent for any speaker in an open-source dataset comprising of 7 accents. Human subjective evaluation demonstrates that our model can better retain a speaker's voice and accent quality than controlled baselines while synthesizing fluent speech in all target languages and accents in our dataset. △ Less

Submitted 24 January, 2023; originally announced January 2023.

Comments: 5 pages, submitted to ICASSP 2023

arXiv:2211.09809 [pdf, other]

SPACE: Speech-driven Portrait Animation with Controllable Expression

Authors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

Abstract: Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic h… ▽ More Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. The project website is available at https://deepimagination.cc/SPACE/ △ Less

Submitted 6 December, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

arXiv:2203.01786 [pdf, other]

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

Authors: Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Abstract: Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for ha… ▽ More Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for handling the aforementioned issues in the context of Normalizing Flow models. We also find this problem to be very well suited for Neural Spline flows, which is a highly expressive alternative to the more common affine-coupling mechanism in Normalizing Flows. △ Less

Submitted 27 June, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 22 pages, 11 figures, 3 tables

arXiv:2202.02299 [pdf, ps, other]

doi 10.1109/TPAMI.2020.3046323

Multi-task head pose estimation in-the-wild

Authors: Roberto Valle, José Miguel Buenaposada, Luis Baumela

Abstract: We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the c… ▽ More We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2021

arXiv:2108.10447 [pdf, other]

One TTS Alignment To Rule Them All

Authors: Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei **, Bryan Catanzaro

Abstract: Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durati… ▽ More Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators. △ Less

Submitted 23 August, 2021; originally announced August 2021.

arXiv:2005.05957 [pdf, other]

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Authors: Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro

Abstract: In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple a… ▽ More In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from IAF and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible map** of data to a latent space that can be manipulated to control many aspects of speech synthesis (pitch, tone, speech rate, cadence, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. In addition, we provide results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training. Code and pre-trained models will be made publicly available at https://github.com/NVIDIA/flowtron △ Less

Submitted 16 July, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

Comments: 10 pages, 7 pictures

arXiv:1912.11683 [pdf, other]

Neural ODEs for Image Segmentation with Level Sets

Authors: Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

Abstract: We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or… ▽ More We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method. Our approach parametrizes the evolution of an initial contour with a NODE that implicitly learns from data a speed function describing the evolution. In addition, for cases where an initial contour is not available and to alleviate the need for careful choice or design of contour embedding functions, we propose a NODE-based method that evolves an image embedding into a dense per-pixel semantic label space. We evaluate our methods on kidney segmentation (KiTS19) and on salient object detection (PASCAL-S, ECSSD and HKU-IS). In addition to improving initial contours provided by deep learning models while using a fraction of their number of parameters, our approach achieves F scores that are higher than several state-of-the-art deep learning algorithms. △ Less

Submitted 25 December, 2019; originally announced December 2019.

arXiv:1910.11997 [pdf, other]

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

Authors: Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro

Abstract: Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from mon… ▽ More Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice. Unlike other methods, we train Mellotron using only read speech data without alignments between text and audio. We evaluate our models using the LJSpeech and LibriTTS datasets. We provide F0 Frame Errors and synthesized samples that include style transfer from other speakers, singers and styles not seen during training, procedural manipulation of rhythm and pitch and choir synthesis. △ Less

Submitted 26 October, 2019; originally announced October 2019.

Comments: 5 pages, 3 figures, 1 table

arXiv:1902.01831 [pdf, other]

doi 10.1016/j.cviu.2019.102846

Face Alignment using a 3D Deeply-initialized Ensemble of Regression Trees

Authors: Roberto Valle, José M. Buenaposada, Antonio Valdés, Luis Baumela

Abstract: Face alignment algorithms locate a set of landmark points in images of faces taken in unrestricted situations. State-of-the-art approaches typically fail or lose accuracy in the presence of occlusions, strong deformations, large pose variations and ambiguous configurations. In this paper we present 3DDE, a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles… ▽ More Face alignment algorithms locate a set of landmark points in images of faces taken in unrestricted situations. State-of-the-art approaches typically fail or lose accuracy in the presence of occlusions, strong deformations, large pose variations and ambiguous configurations. In this paper we present 3DDE, a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles of regression trees. It is initialized by robustly fitting a 3D face model to the probability maps produced by a convolutional neural network. With this initialization we address self-occlusions and large face rotations. Further, the regressor implicitly imposes a prior face shape on the solution, addressing occlusions and ambiguous face configurations. Its coarse-to-fine structure tackles the combinatorial explosion of parts deformation. In the experiments performed, 3DDE improves the state-of-the-art in 300W, COFW, AFLW and WFLW data sets. Finally, we perform cross-dataset experiments that reveal the existence of a significant data set bias in these benchmarks. △ Less

Submitted 13 December, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: Accepted Version to Computer Vision and Image Understanding

arXiv:1812.06310 [pdf, other]

Non-Gaussian Geostatistical Modeling using (skew) t Processes

Authors: M. Bevilacqua, C. Caamaño, R. B. Arellano Valle, V. Morales-Onñate

Abstract: We propose a new model for regression and dependence analysis when addressing spatial data with possibly heavy tails and an asymmetric marginal distribution. We first propose a stationary process with $t$ marginals obtained through scale mixing of a Gaussian process with an inverse square root process with Gamma marginals. We then generalize this construction by considering a skew-Gaussian process… ▽ More We propose a new model for regression and dependence analysis when addressing spatial data with possibly heavy tails and an asymmetric marginal distribution. We first propose a stationary process with $t$ marginals obtained through scale mixing of a Gaussian process with an inverse square root process with Gamma marginals. We then generalize this construction by considering a skew-Gaussian process, thus obtaining a process with skew-t marginal distributions. For the proposed (skew) $t$ process we study the second-order and geometrical properties and in the $t$ case, we provide analytic expressions for the bivariate distribution. In an extensive simulation study, we investigate the use of the weighted pairwise likelihood as a method of estimation for the $t$ process. Moreover we compare the performance of the optimal linear predictor of the $t$ process versus the optimal Gaussian predictor. Finally, the effectiveness of our methodology is illustrated by analyzing a georeferenced dataset on maximum temperatures in Australia △ Less

Submitted 19 December, 2019; v1 submitted 15 December, 2018; originally announced December 2018.

arXiv:1811.00002 [pdf, other]

WaveGlow: A Flow-based Generative Network for Speech Synthesis

Authors: Ryan Prenger, Rafael Valle, Bryan Catanzaro

Abstract: In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood… ▽ More In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online. △ Less

Submitted 30 October, 2018; originally announced November 2018.

Comments: 5 pages, 1 figure, 1 table, 13 equations

arXiv:1808.08670 [pdf, ps, other]

Theoretical approach to the ductile fracture of polycrystalline solids

Authors: Miguel Lagos, César Retamal, Rodrigo Valle

Abstract: It is shown here that fracture after a brief plastic strain, typically of a few percents, is a necessary consequence of the polycrystalline nature of the materials. The polycrystal undergoing plastic deformation is modeled as a flowing continuum of random deformable polyhedra, representing the grains, which fill the space without leaving voids. Adjacent grains slide with a relative velocity propor… ▽ More It is shown here that fracture after a brief plastic strain, typically of a few percents, is a necessary consequence of the polycrystalline nature of the materials. The polycrystal undergoing plastic deformation is modeled as a flowing continuum of random deformable polyhedra, representing the grains, which fill the space without leaving voids. Adjacent grains slide with a relative velocity proportional to the local shear stress resolved on the plane of the shared grain boundary, when greater than a finite threshold. The polyhedral grains reshape continuously to preserve matter continuity, being the forces causing grain sliding dominant over those resha** the grains. It has been shown in the past that this model does not conserve volume, causing a monotonic hydrostatic pressure variation with strain. This effect introduces a novel concept in the theory of plasticity because determines that any fine grained polycrystalline material will fail after a finite plastic strain. Here the hydrostatic pressure dependence on strain is explicitly calculated and shown that has a logarithmic divergence which determines the strain to fracture. Comparison of theoretical results with strains to fracture given by mechanical tests of commercial alloys show very good agreement. △ Less

Submitted 26 August, 2018; originally announced August 2018.

arXiv:1807.10204 [pdf, other]

Visual Display and Retrieval of Music Information

Authors: Rafael Valle

Abstract: This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased. This paper describes computational methods for the visual display and analysis of music information. We provide a concise description of software, music descriptors and data visualization techniques commonly used in music information retrieval. Finally, we provide use cases where the described software, descriptors and visualizations are showcased. △ Less

Submitted 26 July, 2018; originally announced July 2018.

arXiv:1807.04919 [pdf, other]

TequilaGAN: How to easily identify GAN samples

Authors: Rafael Valle, Wilson Cai, Anish Doshi

Abstract: In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fa… ▽ More In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: 10 pages, 16 figures

arXiv:1801.02384 [pdf, other]

Attacking Speaker Recognition With Deep Generative Models

Authors: Wilson Cai, Anish Doshi, Rafael Valle

Abstract: In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class… ▽ More In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems. △ Less

Submitted 8 January, 2018; originally announced January 2018.

Comments: 5 pages, 3 Figures, 1 table

arXiv:1712.04046 [pdf, ps, other]

doi 10.1007/s00521-021-05813-1

Character-Based Handwritten Text Transcription with Attention Networks

Authors: Jason Poulos, Rafael Valle

Abstract: The paper approaches the task of handwritten text recognition (HTR) with attentional encoder-decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavil… ▽ More The paper approaches the task of handwritten text recognition (HTR) with attentional encoder-decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a precise alignment between the source and target. Future research may explore HTR in natural scene images, since the model is capable of transcribing handwritten text without the need for producing segmentations or bounding boxes of text in images. △ Less

Submitted 24 February, 2021; v1 submitted 11 December, 2017; originally announced December 2017.

Journal ref: Neural Comput. & Applic., 33(16), 10563-10573 (2021)

arXiv:1610.09075 [pdf, other]

doi 10.1080/08839514.2018.1448143

Missing Data Imputation for Supervised Learning

Authors: Jason Poulos, Rafael Valle

Abstract: Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or i… ▽ More Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation. △ Less

Submitted 6 August, 2018; v1 submitted 28 October, 2016; originally announced October 2016.

Journal ref: Applied Artificial Intelligence, 32(2), 186-196 (2018)

arXiv:1607.07801 [pdf, other]

ABROA : Audio-Based Room-Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models

Authors: Rafael Valle

Abstract: This paper outlines preliminary steps towards the development of an audio- based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide res… ▽ More This paper outlines preliminary steps towards the development of an audio- based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide results obtained from experiments with audio data from a retail store in Palo Alto, California. Model assessment is done via leave-two-out Bootstrap and model convergence achieves good accuracy, thus representing a contribution to multimodal people counting algorithms. △ Less

Submitted 22 June, 2016; originally announced July 2016.

arXiv:1511.02279 [pdf, other]

doi 10.1109/IoTDI.2015.33

Control Improvisation with Probabilistic Temporal Specifications

Authors: Ilge Akkaya, Daniel J. Fremont, Rafael Valle, Alexandre Donzé, Edward A. Lee, Sanjit A. Seshia

Abstract: We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Mar… ▽ More We consider the problem of generating randomized control sequences for complex networked systems typically actuated by human agents. Our approach leverages a concept known as control improvisation, which is based on a combination of data-driven learning and controller synthesis from formal specifications. We learn from existing data a generative model (for instance, an explicit-duration hidden Markov model, or EDHMM) and then supervise this model in order to guarantee that the generated sequences satisfy some desirable specifications given in Probabilistic Computation Tree Logic (PCTL). We present an implementation of our approach and apply it to the problem of mimicking the use of lighting appliances in a residential unit, with potential applications to home security and resource management. We present experimental results showing that our approach produces realistic control sequences, similar to recorded data based on human actuation, while satisfying suitable formal requirements. △ Less

Submitted 29 February, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

Comments: to appear in Proceedings of the 1st IEEE Conference on Internet-of-Things Design and Implementation (IoTDI'16)

arXiv:1303.0754 [pdf, ps, other]

New constraint on the existence of the mu+-> e+ gamma decay

Authors: MEG Collaboration, J. Adam, X. Bai, A. M. Baldini, E. Baracchini, C. Bemporad, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, C. Cerri, A. de Bari, M. De Gerone, T. Doke, S. Dussoni, J. Egger, K. Fratini, Y. Fujii, L. Galli, G. Gallucci, F. Gatti, B. Golden, M. Grassi, A. Graziosi, D. N. Grigoriev , et al. (49 additional authors not shown)

Abstract: The analysis of a combined data set, totaling 3.6 \times 10^14 stopped muons on target, in the search for the lepton flavour violating decay mu^+ -> e^+ gamma is presented. The data collected by the MEG experiment at the Paul Scherrer Institut show no excess of events compared to background expectations and yield a new upper limit on the branching ratio of this decay of 5.7 \times 10^-13 (90% conf… ▽ More The analysis of a combined data set, totaling 3.6 \times 10^14 stopped muons on target, in the search for the lepton flavour violating decay mu^+ -> e^+ gamma is presented. The data collected by the MEG experiment at the Paul Scherrer Institut show no excess of events compared to background expectations and yield a new upper limit on the branching ratio of this decay of 5.7 \times 10^-13 (90% confidence level). This represents a four times more stringent limit than the previous world best limit set by MEG. △ Less

Submitted 23 April, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

Comments: 5 pages, 3 figures, a version accepted in Phys. Rev. Lett

arXiv:1112.0110 [pdf, other]

doi 10.1109/TNS.2012.2187311

Development and commissioning of the Timing Counter for the MEG Experiment

Authors: M. De Gerone, S. Dussoni, K. Fratini, F. Gatti, R. Valle, G. Boca, P. W. Cattaneo, R. Nardò, M. Rossella, L. Galli, M. Grassi, D. Nicolò, Y. Uchiyama, D. Zanello

Abstract: The Timing Counter of the MEG (Mu to Electron Gamma) experiment is designed to deliver trigger information and to accurately measure the timing of the $e^+$ in searching for the decay $μ^+ \rightarrow e^+γ$. It is part of a magnetic spectrometer with the $μ^+$ decay target in the center. It consists of two sectors upstream and downstream the target, each one with two layers: the inner one made wit… ▽ More The Timing Counter of the MEG (Mu to Electron Gamma) experiment is designed to deliver trigger information and to accurately measure the timing of the $e^+$ in searching for the decay $μ^+ \rightarrow e^+γ$. It is part of a magnetic spectrometer with the $μ^+$ decay target in the center. It consists of two sectors upstream and downstream the target, each one with two layers: the inner one made with scintillating fibers read out by APDs for trigger and track reconstruction, the outer one consisting in scintillating bars read out by PMTs for trigger and time measurement. The design criteria, the obtained performances and the commissioning of the detector are presented herein. △ Less

Submitted 4 February, 2012; v1 submitted 1 December, 2011; originally announced December 2011.

Comments: 10 pages, 20 figures. Presented at the IEEE Nuclear Science Symposium 2010, Knoxville, TN, USA. Accepted by IEEE Transaction on Nuclear Science

Journal ref: IEEE Trans. on Nucl. Sci. Vol.59, No.2, (2012) 379-388

arXiv:1110.6413 [pdf, ps, other]

A pulse fishery model with closures as function of the catch: Conditions for sustainability

Authors: Fernando Córdova-Lepe, Rodrigo del Valle, Gonzalo Robledo

Abstract: We present a model of single species fishery which alternates closed seasons with pulse captures. The novelty is that the length of a closed season is determined by the stock size of the last capture. The process is described by a new type of impulsive differential equations recently introduced. The main result is a fishing effort threshold which determines either the sustainability of the fishery… ▽ More We present a model of single species fishery which alternates closed seasons with pulse captures. The novelty is that the length of a closed season is determined by the stock size of the last capture. The process is described by a new type of impulsive differential equations recently introduced. The main result is a fishing effort threshold which determines either the sustainability of the fishery or the extinction of the resource. △ Less

Submitted 13 October, 2011; originally announced October 2011.

Comments: 15 pages, 3 figures

MSC Class: 34A37; 92B05

arXiv:1107.5547 [pdf, ps, other]

doi 10.1103/PhysRevLett.107.171801

New limit on the lepton-flavour violating decay mu -> e gamma

Authors: MEG collaboration, J. Adam, X. Bai, A. M. Baldini, E. Baracchini, C. Bemporad, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, C. Cerri, A. de Bari, M. De Gerone, T. Doke, S. Dussoni, J. Egger, K. Fratini, Y. Fujii, L. Galli, G. Gallucci, F. Gatti, B. Golden, M. Grassi, D. N. Grigoriev, T. Haruyama , et al. (42 additional authors not shown)

Abstract: We present a new result based on an analysis of the data collected by the MEG detector at the Paul Scherrer Institut in 2009 and 2010, in search of the lepton flavour violating decay mu->e gamma. The likelihood analysis of the combined data sample, which corresponds to a total of 1.8 x 10**14 muon decays, gives a 90% C.L. upper limit of 2.4 x 10**-12 on the branching ratio of the mu->e gamma decay… ▽ More We present a new result based on an analysis of the data collected by the MEG detector at the Paul Scherrer Institut in 2009 and 2010, in search of the lepton flavour violating decay mu->e gamma. The likelihood analysis of the combined data sample, which corresponds to a total of 1.8 x 10**14 muon decays, gives a 90% C.L. upper limit of 2.4 x 10**-12 on the branching ratio of the mu->e gamma decay, constituting the most stringent limit on the existence of this decay to date. △ Less

Submitted 2 September, 2011; v1 submitted 27 July, 2011; originally announced July 2011.

Comments: 5 pages, 2 figures, accepted for publication at Phys. Rev. Lett

Journal ref: Phys. Rev. Lett. 107, 171801 (2011)

arXiv:1104.1035 [pdf, ps, other]

doi 10.1016/j.nuclphysbps.2011.04.031

The Timing Counter of the MEG experiment: calibration and performance

Authors: P. W. Cattaneo, M. De Gerone, S. Dussoni, F. Gatti, M. Rossella, Y. Uchiyama, R. Valle

Abstract: The MEG detector is designed to test Lepton Flavor Violation in the $μ^+\rightarrow e^+γ$ decay down to a Branching Ratio of a few $10^{-13}$. The decay topology consists in the coincident emission of a monochromatic photon in direction opposite to a monochromatic positron. A precise measurement of the relative time $t_{e^+γ}$ is crucial to suppress the background. The Timing Counter (TC) is desig… ▽ More The MEG detector is designed to test Lepton Flavor Violation in the $μ^+\rightarrow e^+γ$ decay down to a Branching Ratio of a few $10^{-13}$. The decay topology consists in the coincident emission of a monochromatic photon in direction opposite to a monochromatic positron. A precise measurement of the relative time $t_{e^+γ}$ is crucial to suppress the background. The Timing Counter (TC) is designed to precisely measure the time of arrival of the $e^+$ and to provide information to the trigger system. It consists of two sectors up and down stream the decay target, each consisting of two layers. The outer one made of scintillating bars and the inner one of scintillating fibers. Their design criteria and performances are described. △ Less

Submitted 6 April, 2011; originally announced April 2011.

Comments: Presented at the 12th Topical Seminar on Innovative Particle and Radiation Detectors (IPRD10) 7 - 10 June 2010, Siena. Accepted by Nuclear Physics B (Proceedings Supplements) (2011)taly

Journal ref: Nucl.Phys.Proc.Suppl.215:281-283,2011

arXiv:0908.2594 [pdf, ps, other]

doi 10.1016/j.nuclphysb.2010.03.030

A limit for the mu -> e gamma decay from the MEG experiment

Authors: MEG collaboration, J. Adam, X. Bai, A. Baldini, E. Baracchini, A. Barchiesi, C. Bemporad, G. Boca, P. W. Cattaneo, G. Cavoto, G. Cecchet, F. Cei, C. Cerri, A. De Bari, M. De Gerone, T. Doke, S. Dussoni, J. Egger, L. Galli, G. Gallucci, F. Gatti, B. Golden, M. Grassi, D. N. Grigoriev, T. Haruyama , et al. (45 additional authors not shown)

Abstract: A search for the decay mu -> e gamma, performed at PSI and based on data from the initial three months of operation of the MEG experiment, yields an upper limit on the branching ratio of BR(mu -> e gamma) < 2.8 x 10**-11 (90% C.L.). This corresponds to the measurement of positrons and photons from ~ 10**14 stopped mu-decays by means of a superconducting positron spectrometer and a 900 litre liqu… ▽ More A search for the decay mu -> e gamma, performed at PSI and based on data from the initial three months of operation of the MEG experiment, yields an upper limit on the branching ratio of BR(mu -> e gamma) < 2.8 x 10**-11 (90% C.L.). This corresponds to the measurement of positrons and photons from ~ 10**14 stopped mu-decays by means of a superconducting positron spectrometer and a 900 litre liquid xenon photon detector. △ Less

Submitted 4 March, 2010; v1 submitted 18 August, 2009; originally announced August 2009.

Comments: 13 pages, 9 figures. v2: improved estimate of photon reconstruction efficiency

Journal ref: Nucl.Phys.B834:1-12,2010

arXiv:0806.2205 [pdf]

doi 10.1088/0953-2048/21/9/095017

Synthesis, crystal structure, microstructure, transport and magnetic properties of SmFeAsO and SmFeAs(O0.93F0.07)

Authors: A. Martinelli, M. Ferretti, P. Manfrinetti, A. Palenzona, M. Tropeano, M. R. Cimberle, C. Ferdeghini, R. Valle, M. Putti, A. S. Siri

Abstract: SmFeAsO and the isostructural superconducting SmFeAs(O0.93F0.07) samples were prepared. Characterization by means of Rietveld refinement of X-ray powder diffraction data, scanning electron microscope observation, transmission electron microscope analysis, resistivity and magnetization measurements were carried out. Sintering treatment strongly improves the grain connectivity, but, on the other h… ▽ More SmFeAsO and the isostructural superconducting SmFeAs(O0.93F0.07) samples were prepared. Characterization by means of Rietveld refinement of X-ray powder diffraction data, scanning electron microscope observation, transmission electron microscope analysis, resistivity and magnetization measurements were carried out. Sintering treatment strongly improves the grain connectivity, but, on the other hand, induces a competition between the thermodynamic stability of the oxy-pnictide and Sm2O3, hence worsening the purity of the sample. In the pristine sample both magnetization and resistivity measurements clearly indicate that two different sources of magnetism are present: the former related to Fe ordering at 140 K and the latter due to the Sm ions that orders antiferromagnetically at low temperature. The feature at 140 K disappears in the F-substituted sample and, at low temperatures a superconducting transition appears. The magnetoresistivity curves of the F-substituted sample probably indicates very high critical field values. △ Less

Submitted 13 June, 2008; originally announced June 2008.

Comments: Submitted to: Superconductor Science and Technology

Journal ref: Superconductor Science and Technology 21 (2008) 095017

arXiv:0712.3267 [pdf, other]

doi 10.1103/PhysRevB.78.045103

Direct evidence of overdamped Peierls-coupled modes in TTF-CA temperature-induced phase transition

Authors: A. Girlando, M. Masino, A. Painelli, N. Drichko, M. Dressel, A. Brillante, R. G. Della Valle, E. Venuti

Abstract: In this paper we elucidate the optical response resulting from the interplay of charge distribution (ionicity) and Peierls instability (dimerization) in the neutral-ionic, ferroelectric phase transition of tetrathiafulvalene-chloranil (TTF-CA), a mixed-stack quasi-one-dimensional charge-transfer crystal. We present far-infrared reflectivity measurements down to 5 cm-1 as a function of temperatur… ▽ More In this paper we elucidate the optical response resulting from the interplay of charge distribution (ionicity) and Peierls instability (dimerization) in the neutral-ionic, ferroelectric phase transition of tetrathiafulvalene-chloranil (TTF-CA), a mixed-stack quasi-one-dimensional charge-transfer crystal. We present far-infrared reflectivity measurements down to 5 cm-1 as a function of temperature above the phase transition (300 - 82 K). The coupling between electrons and lattice phonons in the pre-transitional regime is analyzed on the basis of phonon eigenvectors and polarizability calculations of the one-dimensional Peierls-Hubbard model. We find a multi-phonon Peierls coupling, but on approaching the transition the spectral weight and the coupling shift progressively towards the phonons at lower frequencies, resulting in a soft-mode behavior only for the lowest frequency phonon near the transition temperature. Moreover, in the proximity of the phase transition, the lowest-frequency phonon becomes overdamped, due to anharmonicity induced by its coupling to electrons. The implications of these findings for the neutral-ionic transition mechanism is shortly discussed. △ Less

Submitted 19 December, 2007; originally announced December 2007.

Comments: 11 pages, 13 figures

Journal ref: Phys. Rev. B 78, 045103 (2008)

arXiv:cond-mat/0312300 [pdf, ps, other]

doi 10.1103/PhysRevB.70.104106

Phonons and structures of tetracene polymorphs at low temperature and high pressure

Authors: Elisabetta Venuti, Raffaele Guido Della Valle, Luca Farina, Aldo Brillante, Matteo Masino, Alberto Girlando

Abstract: Crystals of tetracene have been studied by means of lattice phonon Raman spectroscopy as a function of temperature and pressure. Two different phases (polymorphs I and II) have been obtained, depending on sample preparation and history. Polymorph I is the most frequently grown phase, stable at ambient conditions. A pressure induced phase transition, observed above 1 GPa, leads to polymorph II, w… ▽ More Crystals of tetracene have been studied by means of lattice phonon Raman spectroscopy as a function of temperature and pressure. Two different phases (polymorphs I and II) have been obtained, depending on sample preparation and history. Polymorph I is the most frequently grown phase, stable at ambient conditions. A pressure induced phase transition, observed above 1 GPa, leads to polymorph II, which is also obtained at temperatures below 140 K. Polymorph II can also be maintained at ambient conditions. We have calculated the crystallographic structures and phonon frequencies as a function of temperature, starting from the configurations of the energy minima found by exploring the potential energy surface of crystalline tetracene. The spectra calculated for the first and second deepest minima match satisfactorily those measured for polymorphs I and II, respectively. All published x-ray structures, once assigned to the appropriate polymorph, are also reproduced. △ Less

Submitted 14 April, 2004; v1 submitted 11 December, 2003; originally announced December 2003.

Comments: 8 pages, 5 figures, RevTeX4, update after referees reports

Journal ref: Phys. Rev. B 70, 104106 (2004) (8 pages)

arXiv:cond-mat/0307115 [pdf, ps, other]

doi 10.1080/15421400490478542

Polymorphism, phonon dynamics and carrier-phonon coupling in pentacene

Authors: Raffaele G. Della Valle, Aldo Brillante, Luca Farina, Elisabetta Venuti, Matteo Masino, Alberto Girlando

Abstract: The crystal structure and phonon dynamics of pentacene is computed with the Quasi Harmonic Lattice Dynamics (QHLD) method, based on atom-atom potential. We show that two crystalline phases of pentacene exist, rather similar in thermodynamic stability and in molecular density. The two phases can be easily distinguished by Raman spectroscopy in the 10-100 cm-1 spectral region. We have not found an… ▽ More The crystal structure and phonon dynamics of pentacene is computed with the Quasi Harmonic Lattice Dynamics (QHLD) method, based on atom-atom potential. We show that two crystalline phases of pentacene exist, rather similar in thermodynamic stability and in molecular density. The two phases can be easily distinguished by Raman spectroscopy in the 10-100 cm-1 spectral region. We have not found any temperature induced phase transition, whereas a sluggish phase change to the denser phase is induced by pressure. The bandwidths of the two phases are slightly different. The charge carrier coupling to low-frequency phonons is calculated. △ Less

Submitted 4 July, 2003; originally announced July 2003.

Comments: 6 pages, 3 figures. Presented at ICFPAM-7

Journal ref: Mol. Cryst. Liq. Cryst. 416 (2004) 145-154

arXiv:cond-mat/0202141 [pdf, ps, other]

doi 10.1103/PhysRevB.66.100507

BEDT-TTF organic superconductors: the entangled role of phonons

Authors: Alberto Girlando, Matteo Masino, Aldo Brillante, Raffaele G. Della Valle, Elisabetta Venuti

Abstract: We calculate the lattice phonons and the electron-phonon coupling of the organic superconductor κ-(BEDT-TTF)_2 I_3, reproducing all available experimental data connected to phonon dynamics. Low-frequency intra-molecular vibrations are strongly mixed to lattice phonons. Both acoustic and optical phonons are appreciably coupled to electrons through the modulation of the hop** integrals (e-LP cou… ▽ More We calculate the lattice phonons and the electron-phonon coupling of the organic superconductor κ-(BEDT-TTF)_2 I_3, reproducing all available experimental data connected to phonon dynamics. Low-frequency intra-molecular vibrations are strongly mixed to lattice phonons. Both acoustic and optical phonons are appreciably coupled to electrons through the modulation of the hop** integrals (e-LP coupling). By comparing the results relevant to superconducting κ- and β-(BEDT-TTF)_2 I_3, we show that electron-phonon coupling is fundamental to the pairing mechanism. Both e-LP and electron-molecular vibration (e-MV) coupling are essential to reproduce the critical temperatures. The e-LP coupling is stronger, but e-MV is instrumental to increase the average phonon frequency. △ Less

Submitted 16 October, 2002; v1 submitted 8 February, 2002; originally announced February 2002.

Comments: 4 pages, including 4 figures. Published version, with Ref. 17 corrected after publication

Journal ref: Phys. Rev. B 66, 100507 (2002)

arXiv:cond-mat/0003297 [pdf, ps, other]

doi 10.1103/PhysRevB.62.14476

Lattice dynamics and electron-phonon coupling in β-(BEDT-TTF)_2I_3 organic superconductor

Authors: A. Girlando, M. Masino, G. Visentini, R. G. Della Valle, A. Brillante, E. Venuti

Abstract: The crystal structure and lattice phonons of (BEDT-TTF)_2I_3 superconducting β-phase are computed and analyzed by the Quasi Harmonic Lattice Dynamics (QHLD) method. Whereas the crystal structure and its temperature and pressure dependence are properly reproduced within a rigid molecule approximation, this has to be removed to account for the specific heat data. Such a mixing between lattice and… ▽ More The crystal structure and lattice phonons of (BEDT-TTF)_2I_3 superconducting β-phase are computed and analyzed by the Quasi Harmonic Lattice Dynamics (QHLD) method. Whereas the crystal structure and its temperature and pressure dependence are properly reproduced within a rigid molecule approximation, this has to be removed to account for the specific heat data. Such a mixing between lattice and low-frequency intramolecular vibrations also yields good agreement with the observed Raman and infrared frequencies. From the eigenvectors of the low-frequency phonons we calculate the electron-phonon coupling constants due to the modulation of charge transfer (hop**) integrals. The hop** integrals are evaluated by the extended Hueckel method applied to all nearest-neighbor BEDT-TTF pairs in the ab crystal plane. From the averaged electron-phonon coupling constants and the QHLD phonon density of states we derive the Eliashberg coupling function, which compares well with that experimentally obtained from point-contact spectroscopy. The corresponding dimensionless coupling constant λis found to be around 0.4 . △ Less

Submitted 24 March, 2000; v1 submitted 17 March, 2000; originally announced March 2000.

Comments: 15 pages, 5 figures, RevTeX 3.1 - Replacement for cond-mat 0003297 We have corrected Section IIE, where the analytical average over the Fermi surface was incorrect. We performed the average numerically over the Fermi surface generated by our dimer model. The λvalue is practically unchanged

Journal ref: Phys. Rev. B1, 62 (2000) 14476-14486

arXiv:cond-mat/9907359 [pdf, ps, other]

doi 10.1063/1.480089

A scaling approximation for structure factors in the integral equation theory of polydisperse nonionic colloidal fluids

Authors: Domenico Gazzillo, Achille Giacometti, Raffaele G. Della Valle, Elisabetta Venuti, Flavio Carsughi

Abstract: Integral equation of pure liquids, combined with a new "scaling approximation" based on a corresponding states treatment of pair correlation functions, is used to evaluate approximate structure factors for colloidal fluids constituted of uncharged particles with polydispersity in size and energy parameters. Both hard spheres and Lennard-Jones interactions are considered. For polydisperse hard sp… ▽ More Integral equation of pure liquids, combined with a new "scaling approximation" based on a corresponding states treatment of pair correlation functions, is used to evaluate approximate structure factors for colloidal fluids constituted of uncharged particles with polydispersity in size and energy parameters. Both hard spheres and Lennard-Jones interactions are considered. For polydisperse hard spheres, the scaling approximation is compared to theories utilized by small angle scattering experimentalists (decoupling approximation, local monodisperse approximation)and to the van der Waals one-fluid theory. The results are tested against predictions from analytical expressions, exact within the Percus-Yevick approximation. For polydisperse Lennard-Jones particles, the scaling approximation combined with a "modified hypernetted chain" integral equation, is tested against molecular dynamics data generated for the present work. Despite ist simplicity, the scaling approximation exhibits a satisfactory performance for both potentials and represents a considerable improvement over the above mentioned theories. Shortcomings of the proposed theory, its applicability to the analysis of experimental scattering data, and its possible extensions to different potentials are finally discussed. △ Less

Submitted 15 October, 1999; v1 submitted 23 July, 1999; originally announced July 1999.

Comments: 12 pages, 7 postscript figures (included), Latex 3.0, uses aps.sty

Journal ref: J. Chem. Phys. 111 (1999) 7636-7645

arXiv:cond-mat/9903051 [pdf, ps, other]

doi 10.1103/PhysRevB.59.13699

Towards an effective potential for the monomer, dimer, hexamer, solid and liquid forms of hydrogen fluoride

Authors: Raffaele Guido Della Valle, Domenico Gazzillo

Abstract: We present an attempt to build up a new two-body effective potential for hydrogen fluoride, fitted to theoretical and experimental data relevant not only to the gas and liquid phases, but also to the crystal. The model is simple enough to be used in Molecular Dynamics and Monte Carlo simulations. The potential consists of: a) an intra-molecular contribution, allowing for variations of the molecu… ▽ More We present an attempt to build up a new two-body effective potential for hydrogen fluoride, fitted to theoretical and experimental data relevant not only to the gas and liquid phases, but also to the crystal. The model is simple enough to be used in Molecular Dynamics and Monte Carlo simulations. The potential consists of: a) an intra-molecular contribution, allowing for variations of the molecular length, plus b) an inter-molecular part, with three charged sites on each monomer and a Buckingham "exp-6" interaction between fluorines. The model is able to reproduce a significant number of observables on the monomer, dimer, hexamer, solid and liquid forms of HF. The shortcomings of the model are pointed out and possible improvements are finally discussed. △ Less

Submitted 6 March, 1999; v1 submitted 3 March, 1999; originally announced March 1999.

Comments: LaTeX, 24 pages, 2 figures. For related papers see also http://www.chim.unifi.it:8080/~valle/

Journal ref: Phys. Rev. B1 59 (1999) 13699-13706

arXiv:cond-mat/9805048 [pdf, ps, other]

doi 10.1103/PhysRevB.58.206

Quasi Harmonic Lattice Dynamics and Molecular Dynamics calculations for the Lennard-Jones solids

Authors: Raffaele Guido Della Valle, Elisabetta Venuti

Abstract: We present Molecular Dynamics (MD), Quasi Harmonic Lattice Dynamics (QHLD) and Energy Minimization (EM) calculations for the crystal structure of Ne, Ar, Kr and Xe as a function of pressure and temperature. New Lennard-Jones (LJ) parameters are obtained for Ne, Kr and Xe to reproduce the experimental pressure dependence of the density. We employ a simple method which combines results of QHLD and… ▽ More We present Molecular Dynamics (MD), Quasi Harmonic Lattice Dynamics (QHLD) and Energy Minimization (EM) calculations for the crystal structure of Ne, Ar, Kr and Xe as a function of pressure and temperature. New Lennard-Jones (LJ) parameters are obtained for Ne, Kr and Xe to reproduce the experimental pressure dependence of the density. We employ a simple method which combines results of QHLD and MD calculations to achieve densities in good agreement with experiment from 0 K to melting. Melting is discussed in connection with intrinsic instability of the solid as given by the QHLD approximation. (See http://www.fci.unibo.it/~valle for related papers) △ Less

Submitted 5 May, 1998; originally announced May 1998.

Comments: 7 pages, 5 figures, REVtex

Journal ref: Phys. Rev. B1 58 (1998) 206-212

Showing 1–44 of 44 results for author: Valle, R