Search | arXiv e-print repository

Exploration strategies for articulatory synthesis of complex syllable onsets

Authors: Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul K. Krug, Peter Birkholz, Yi Xu

Abstract: High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, develo** appropriate map**s from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these map**s without manual intervention. We demonstrate the product… ▽ More High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, develo** appropriate map**s from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these map**s without manual intervention. We demonstrate the production of syllables with complex onsets and discuss the quality of the articulatory gestures with reference to coarticulation. △ Less

Submitted 30 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Accepted at Interspeech 2022

arXiv:2103.04807 [pdf, other]

doi 10.1016/j.engappai.2022.104964

PyRCN: A Toolbox for Exploration and Application of Reservoir Computing Networks

Authors: Peter Steiner, Azarakhsh Jalalvand, Simon Stone, Peter Birkholz

Abstract: Reservoir Computing Networks (RCNs) belong to a group of machine learning techniques that project the input space non-linearly into a high-dimensional feature space, where the underlying task can be solved linearly. Popular variants of RCNs are capable of solving complex tasks equivalently to widely used deep neural networks, but with a substantially simpler training paradigm based on linear regre… ▽ More Reservoir Computing Networks (RCNs) belong to a group of machine learning techniques that project the input space non-linearly into a high-dimensional feature space, where the underlying task can be solved linearly. Popular variants of RCNs are capable of solving complex tasks equivalently to widely used deep neural networks, but with a substantially simpler training paradigm based on linear regression. In this paper, we show how to uniformly describe RCNs with small and clearly defined building blocks, and we introduce the Python toolbox PyRCN (Python Reservoir Computing Networks) for optimizing, training and analyzing RCNs on arbitrarily large datasets. The tool is based on widely-used scientific packages and complies with the scikit-learn interface specification. It provides a platform for educational and exploratory analyses of RCNs, as well as a framework to apply RCNs on complex tasks including sequence processing. With a small number of building blocks, the framework allows the implementation of numerous different RCN architectures. We provide code examples on how to set up RCNs for time series prediction and for sequence classification tasks. PyRCN is around ten times faster than reference toolboxes on a benchmark task while requiring substantially less boilerplate code. △ Less

Submitted 10 May, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Preprint accepted for publication in Engineering Applications of Artificial Intelligence

Journal ref: Engineering Applications of Artificial Intelligence 113 (2022) 104964

arXiv:2103.04710 [pdf, other]

doi 10.1109/TNNLS.2022.3145565

Cluster-based Input Weight Initialization for Echo State Networks

Authors: Peter Steiner, Azarakhsh Jalalvand, Peter Birkholz

Abstract: Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of… ▽ More Echo State Networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the $K$-Means algorithm on the training data. We show that for a large variety of datasets this initialization performs equivalently or superior than a randomly initialized ESN whilst needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data. △ Less

Submitted 20 January, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted for publication in IEEE Transactions on Neural Network and Learning System (TNNLS), 2022

arXiv:2005.09986 [pdf, other]

Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

Authors: Branislav Gerazov, Daniel van Niekerk, Anqi Xu, Paul Konstantin Krug, Peter Birkholz, Yi Xu

Abstract: The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoust… ▽ More The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics and also to offer insight to perceptual results. △ Less

Submitted 2 April, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: Submitted to INTERSPEECH 2021

Showing 1–4 of 4 results for author: Birkholz, P