Search | arXiv e-print repository

doi 10.1109/ICASSP49357.2023.10094909

SQuId: Measuring Speech Naturalness in Many Languages

Authors: Thibault Sellam, Ankur Bapna, Joshua Camp, Diana Mackinnon, Ankur P. Parikh, Jason Riesa

Abstract: Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 loca… ▽ More Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales-the largest effort of this type to date. The main insight is that training one model on many locales consistently outperforms mono-locale baselines. We present our task, the model, and show that it outperforms a competitive baseline based on w2v-BERT and VoiceMOS by 50.0%. We then demonstrate the effectiveness of cross-locale transfer during fine-tuning and highlight its effect on zero-shot locales, i.e., locales for which there is no fine-tuning data. Through a series of analyses, we highlight the role of non-linguistic effects such as sound artifacts in cross-locale transfer. Finally, we present the effect of our design decision, e.g., model size, pre-training diversity, and language rebalancing with several ablation experiments. △ Less

Submitted 1 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at ICASSP 2023, with additional material in the appendix

arXiv:2206.03461 [pdf, other]

Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models

Authors: Walter H. L. Pinaya, Mark S. Graham, Robert Gray, Pedro F Da Costa, Petru-Daniel Tudosiu, Paul Wright, Yee H. Mah, Andrew D. MacKinnon, James T. Teo, Rolf Jager, David Werring, Geraint Rees, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso

Abstract: Deep generative models have emerged as promising tools for detecting arbitrary anomalies in data, dispensing with the necessity for manual labelling. Recently, autoregressive transformers have achieved state-of-the-art performance for anomaly detection in medical imaging. Nonetheless, these models still have some intrinsic weaknesses, such as requiring images to be modelled as 1D sequences, the ac… ▽ More Deep generative models have emerged as promising tools for detecting arbitrary anomalies in data, dispensing with the necessity for manual labelling. Recently, autoregressive transformers have achieved state-of-the-art performance for anomaly detection in medical imaging. Nonetheless, these models still have some intrinsic weaknesses, such as requiring images to be modelled as 1D sequences, the accumulation of errors during the sampling process, and the significant inference times associated with transformers. Denoising diffusion probabilistic models are a class of non-autoregressive generative models recently shown to produce excellent samples in computer vision (surpassing Generative Adversarial Networks), and to achieve log-likelihoods that are competitive with transformers while having fast inference times. Diffusion models can be applied to the latent representations learnt by autoencoders, making them easily scalable and great candidates for application to high dimensional data, such as medical images. Here, we propose a method based on diffusion models to detect and segment anomalies in brain imaging. By training the models on healthy data and then exploring its diffusion and reverse steps across its Markov chain, we can identify anomalous areas in the latent space and hence identify anomalies in the pixel space. Our diffusion models achieve competitive performance compared with autoregressive approaches across a series of experiments with 2D CT and MRI data involving synthetic and real pathological lesions with much reduced inference times, making their usage clinically viable. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2204.00151 [pdf]

Experimental control of Tc in AlB2-type compounds using an applied voltage

Authors: Jose A. Alarco, Mahboobeh Shahbazi, Ian D. R. Mackinnon

Abstract: We utilize the van de Pauw technique combined with Density Functional Theory to show that an external voltage applied to superconducting AlB2-type compounds, such as MgB2 and Mg(B1.9C0.1), effects substantive changes in transition temperature due to modification of electronic band structure. An applied voltage results in a symmetric split of degenerate sigma bands in MgB2 similar to that calculate… ▽ More We utilize the van de Pauw technique combined with Density Functional Theory to show that an external voltage applied to superconducting AlB2-type compounds, such as MgB2 and Mg(B1.9C0.1), effects substantive changes in transition temperature due to modification of electronic band structure. An applied voltage results in a symmetric split of degenerate sigma bands in MgB2 similar to that calculated for atom displacement along the B-B bond aligned with E2g phonon mode directions. For AlB2, similar splitting of sigma bands occurs albeit with higher voltage requirement compared to MgB2. Experimental data show that Tc values for MgB2 and Mg(B1.9C0.1) reduce consistently with increased increments of applied voltage. Zero resistance is limited by an upper applied voltage depending on the compound. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: 9 pages, 4 figures, supplement (1 table, 4 figures)

Showing 1–3 of 3 results for author: Mackinnon, D