Search | arXiv e-print repository

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Authors: Shannon Wotherspoon, William Hartmann, Matthew Snover

Abstract: This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relat… ▽ More This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relative to training on more widely available text data sets. We demonstrate that fine-tuning a general-purpose translation model to our Mandarin-English conversational telephone speech training set improves target-domain BLEU by more than 8 points, highlighting the importance of matched training data. △ Less

Submitted 25 March, 2024; originally announced April 2024.

Comments: 2 pages

arXiv:2106.07699 [pdf, ps, other]

Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition

Authors: Andrew Slottje, Shannon Wotherspoon, William Hartmann, Matthew Snover, Owen Kimball

Abstract: Modeling code-switched speech is an important problem in automatic speech recognition (ASR). Labeled code-switched data are rare, so monolingual data are often used to model code-switched speech. These monolingual data may be more closely matched to one of the languages in the code-switch pair. We show that such asymmetry can bias prediction toward the better-matched language and degrade overall m… ▽ More Modeling code-switched speech is an important problem in automatic speech recognition (ASR). Labeled code-switched data are rare, so monolingual data are often used to model code-switched speech. These monolingual data may be more closely matched to one of the languages in the code-switch pair. We show that such asymmetry can bias prediction toward the better-matched language and degrade overall model performance. To address this issue, we propose a semi-supervised approach for code-switched ASR. We consider the case of English-Mandarin code-switching, and the problem of using monolingual data to build bilingual "transcription models'' for annotation of unlabeled code-switched data. We first build multiple transcription models so that their individual predictions are variously biased toward either English or Mandarin. We then combine these biased transcriptions using confidence-based selection. This strategy generates a superior transcript for semi-supervised training, and obtains a 19% relative improvement compared to a semi-supervised system that relies on a transcription model built with only the best-matched monolingual data. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: 5 pages

arXiv:2012.13004 [pdf, ps, other]

Speech Synthesis as Augmentation for Low-Resource ASR

Authors: Deblin Bagchi, Shannon Wotherspoon, Zhuolin Jiang, Prasanna Muthukumar

Abstract: Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibili… ▽ More Speech synthesis might hold the key to low-resource speech recognition. Data augmentation techniques have become an essential part of modern speech recognition training. Yet, they are simple, naive, and rarely reflect real-world conditions. Meanwhile, speech synthesis techniques have been rapidly getting closer to the goal of achieving human-like speech. In this paper, we investigate the possibility of using synthesized speech as a form of data augmentation to lower the resources necessary to build a speech recognizer. We experiment with three different kinds of synthesizers: statistical parametric, neural, and adversarial. Our findings are interesting and point to new research directions for the future. △ Less

Submitted 23 December, 2020; originally announced December 2020.

arXiv:0810.4739 [pdf, ps, other]

doi 10.1111/j.1365-2966.2008.14141.x

Head-Tail Galaxies: Beacons of High-Density Regions in Clusters

Authors: Minnie Y. Mao, Melanie Johnston-Hollitt, Jamie B. Stevens, Simon J. Wotherspoon

Abstract: Using radio data at 1.4 GHz from the ATCA we identify five head-tail (HT) galaxies in the central region of the Horologium-Reticulum Supercluster (HRS). Physical parameters of the HT galaxies were determined along with substructure in the HRS to probe the relationship between environment and radio properties. Using a density enhancement technique applied to 582 spectroscopic measurements in the… ▽ More Using radio data at 1.4 GHz from the ATCA we identify five head-tail (HT) galaxies in the central region of the Horologium-Reticulum Supercluster (HRS). Physical parameters of the HT galaxies were determined along with substructure in the HRS to probe the relationship between environment and radio properties. Using a density enhancement technique applied to 582 spectroscopic measurements in the 2 degree x 2 degree region about A3125/A3128, we find all five HT galaxies reside in regions of extremely high density (>100 galaxies/Mpc^3). In fact, the environments surrounding HT galaxies are statistically denser than those environments surrounding non-HT galaxies and among the densest environments in a cluster. Additionally, the HT galaxies are found in regions of enhanced X-ray emission and we show that the enhanced density continues out to substructure groups of 10 members. We propose that it is the high densities that allow ram pressure to bend the HT galaxies as opposed to previously proposed mechanisms relying on exceptionally high peculiar velocities. △ Less

Submitted 27 October, 2008; originally announced October 2008.

Comments: 12 pages, 5 figures, accepted in MNRAS

arXiv:astro-ph/0702673 [pdf, ps, other]

doi 10.1111/j.1365-2966.2007.11641.x

A search for 22-GHz water masers within the giant molecular cloud associated with RCW106

Authors: S. L. Breen, S. P. Ellingsen, M. Johnston-Hollitt, S. Wotherspoon, I. Bains, M. G. Burton, M. Cunningham, N. Lo, C. E. Senkbeil, T. Wong

Abstract: We report the results of a blind search for 22-GHz water masers in two regions, covering approximately half a square degree, within the giant molecular cloud associated with RCW 106. The complete search of the two regions was carried out with the 26-m Mount Pleasant radio telescope and resulted in the detection of nine water masers, five of which are new detections. Australia Telescope Compact A… ▽ More We report the results of a blind search for 22-GHz water masers in two regions, covering approximately half a square degree, within the giant molecular cloud associated with RCW 106. The complete search of the two regions was carried out with the 26-m Mount Pleasant radio telescope and resulted in the detection of nine water masers, five of which are new detections. Australia Telescope Compact Array (ATCA) observations of these detections have allowed us to obtain positions with arcsecond accuracy, allowing meaningful comparison with infrared and molecular data for the region. We find that for the regions surveyed there are more water masers than either 6.7-GHz methanol, or main-line OH masers. The water masers are concentrated towards the central axis of the star formation region, in contrast to the 6.7-GHz methanol masers which tend to be located near the periphery. The colours of the GLIMPSE point sources associated with the water masers are similar to those of 6.7-GHz methanol masers, but slightly less red. We have made a statistical investigation of the properties of the 13CO and 1.2-mm dust clumps with and without associated water masers. We find that the water masers are associated with the more massive, denser and brighter 13CO and 1.2-mm dust clumps. We present statistical models that are able to predict those 13CO and 1.2-mm dust clumps that are likely to have associated water masers, with a low misclassification rate. △ Less

Submitted 26 February, 2007; originally announced February 2007.

Comments: 18 pages, 8 figures

Journal ref: Mon.Not.Roy.Astron.Soc.377:491-506,2007

Showing 1–5 of 5 results for author: Wotherspoon, S