Skip to main content

Showing 1–13 of 13 results for author: Steiner, I

.
  1. arXiv:1911.01601  [pdf, other

    eess.AS cs.CR cs.SD eess.SP

    ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

    Authors: Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika , et al. (15 additional authors not shown)

    Abstract: Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as "presentation attacks." These vulnerabilities are generally unacceptable and call for spoofing countermeasures or "presentation attack detection" systems. In addition to imperso… ▽ More

    Submitted 14 July, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted, Computer Speech and Language. This manuscript version is made available under the CC-BY-NC-ND 4.0. For the published version on Elsevier website, please visit https://doi.org/10.1016/j.csl.2020.101114

  2. Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System

    Authors: Eran Raveh, Ingmar Steiner, Iona Gessinger, Bernd Möbius

    Abstract: This paper presents a study on mutual speech variation influences in a human-computer setting. The study highlights behavioral patterns in data collected as part of a shadowing experiment, and is performed using a novel end-to-end platform for studying phonetic variation in dialogue. It includes a spoken dialogue system capable of detecting and tracking the state of phonetic features in the user's… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: Proc. 20th International Conference on Speech and Computer (SPECOM)

  3. arXiv:1712.04798  [pdf, other

    cs.HC cs.CL

    A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks

    Authors: Arif Khan, Ingmar Steiner, Yusuke Sugano, Andreas Bulling, Ross Macdonald

    Abstract: Phonetic segmentation is the process of splitting speech into distinct phonetic units. Human experts routinely perform this task manually by analyzing auditory and visual cues using analysis software, which is an extremely time-consuming process. Methods exist for automatic segmentation, but these are not always accurate enough. In order to improve automatic segmentation, we need to model it as cl… ▽ More

    Submitted 11 May, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Journal ref: Proc. LREC 11 (2018) 4277-4281

  4. arXiv:1712.04787  [pdf, other

    cs.CL cs.HC

    Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform

    Authors: Ingmar Steiner, Sébastien Le Maguer

    Abstract: We present a new workflow to create components for the MaryTTS text-to-speech synthesis platform, which is popular with researchers and developers, extending it to support new languages and custom synthetic voices. This workflow replaces the previous toolkit with an efficient, flexible process that leverages modern build automation and cloud-hosted infrastructure. Moreover, it is compatible with t… ▽ More

    Submitted 11 May, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Journal ref: Proc. LREC 11 (2018) 3171-3175

  5. Synthesis of Tongue Motion and Acoustics from Text using a Multimodal Articulatory Database

    Authors: Ingmar Steiner, Sébastien Le Maguer, Alexander Hewer

    Abstract: We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a 3D model of the tongue surface to an articulatory dataset and training a statistical parametric speech synthesis system directly on the tongue model parameters. We evaluate the model at every step by comparing the spatial coordinates… ▽ More

    Submitted 13 April, 2018; v1 submitted 29 December, 2016; originally announced December 2016.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (2017) 2351 - 2361

  6. arXiv:1612.06114  [pdf, other

    cs.HC

    A real-time framework for visual feedback of articulatory data using statistical shape models

    Authors: Kristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer

    Abstract: We present a novel open-source framework for visualizing electromagnetic articulography (EMA) data in real-time, with a modular framework and anatomically accurate tongue and palate models derived by multilinear subspace learning.

    Submitted 19 December, 2016; originally announced December 2016.

    Comments: 17th Annual Conference of the International Speech Communication Association (Interspeech), Oct 2016, San Francisco, United States

  7. A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

    Authors: Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond

    Abstract: We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a templa… ▽ More

    Submitted 17 April, 2018; v1 submitted 15 December, 2016; originally announced December 2016.

    Journal ref: Computer Speech & Language 51 (2018) 68-92

  8. arXiv:1602.07679  [pdf, other

    cs.CV

    A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract

    Authors: Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, Korin Richmond

    Abstract: We describe a minimally-supervised method for computing a statistical shape space model of the palate surface. The model is created from a corpus of volumetric magnetic resonance imaging (MRI) scans collected from 12 speakers. We extract a 3D mesh of the palate from each speaker, then train the model using principal component analysis (PCA). The palate model is then tested using 3D MRI from anothe… ▽ More

    Submitted 4 September, 2015; originally announced February 2016.

    Comments: Proceedings of the 18th International Congress of Phonetic Sciences, Aug 2015, Glasgow, United Kingdom. 2015, http://www.icphs2015.info/

  9. arXiv:1310.8585  [pdf, other

    cs.HC q-bio.QM

    Speech animation using electromagnetic articulography as motion capture data

    Authors: Ingmar Steiner, Korin Richmond, Slim Ouni

    Abstract: Electromagnetic articulography (EMA) captures the position and orientation of a number of markers, attached to the articulators, during speech. As such, it performs the same function for speech that conventional motion capture does for full-body movements acquired with optical modalities, a long-time staple technique of the animation industry. In this paper, EMA data is processed from a motion-cap… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.

    Journal ref: AVSP - 12th International Conference on Auditory-Visual Speech Processing - 2013 (2013) 55-60

  10. arXiv:1209.4982  [pdf, other

    cs.HC cs.GR

    Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis

    Authors: Ingmar Steiner, Korin Richmond, Slim Ouni

    Abstract: The importance of modeling speech articulation for high-quality audiovisual (AV) speech synthesis is widely acknowledged. Nevertheless, while state-of-the-art, data-driven approaches to facial animation can make use of sophisticated motion capture techniques, the animation of the intraoral articulators (viz. the tongue, jaw, and velum) typically makes use of simple rules or viseme morphing, in sta… ▽ More

    Submitted 22 September, 2012; originally announced September 2012.

    Journal ref: 3rd International Symposium on Facial Analysis and Animation (2012)

  11. arXiv:1203.3574  [pdf, other

    cs.HC cs.GR

    Artimate: an articulatory animation framework for audiovisual speech synthesis

    Authors: Ingmar Steiner, Slim Ouni

    Abstract: We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide r… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Workshop on Innovation and Applications in Speech Technology (2012)

  12. arXiv:1201.4080  [pdf, other

    cs.AI

    Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis

    Authors: Ingmar Steiner, Slim Ouni

    Abstract: We present a technique for the animation of a 3D kinematic tongue model, one component of the talking head of an acoustic-visual (AV) speech synthesizer. The skeletal animation approach is adapted to make use of a deformable rig controlled by tongue motion capture data obtained with electromagnetic articulography (EMA), while the tongue surface is extracted from volumetric magnetic resonance imagi… ▽ More

    Submitted 19 January, 2012; originally announced January 2012.

    Journal ref: Elektronische Sprachsignalverarbeitung 2011 TUDpress (Ed.) (2011) 245-252

  13. GROND - a 7-channel imager

    Authors: J. Greiner, W. Bornemann, C. Clemens, M. Deuter, G. Hasinger, M. Honsberg, H. Huber, S. Huber, M. Krauss, T. Krühler, A. Küpcü Yoldaş, H. Mayer-Hasselwander, B. Mican, N. Primak, F. Schrey, I. Steiner, G. Szokoly, C. C. Thöne, A. Yoldaş, S. Klose, U. Laux, J. Winkler

    Abstract: We describe the construction of GROND, a 7-channel imager, primarily designed for rapid observations of gamma-ray burst afterglows. It allows simultaneous imaging in the Sloan g'r'i'z' and near-infrared $JHK$ bands. GROND was commissioned at the MPI/ESO 2.2m telescope at La Silla (Chile) in April 2007, and first results of its performance and calibration are presented.

    Submitted 30 January, 2008; originally announced January 2008.

    Comments: 25 pages, 21 figs, PASP (subm); version with full-resolution figures at http://www.mpe.mpg.de/~jcg/GROND/grond_pasp.pdf