Search | arXiv e-print repository

doi 10.1109/TASLP.2020.3000037

Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Authors: Dhananjaya Gowda, Sudarsana Reddy Kadiri, Brad Story, Paavo Alku

Abstract: In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based o… ▽ More In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the $L_1$ optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack △ Less

Submitted 31 August, 2023; originally announced August 2023.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, pp. 1901-1914, 2020

arXiv:2212.05037 [pdf, other]

A Topological Deep Learning Framework for Neural Spike Decoding

Authors: Edward C. Mitchell, Brittany Story, David Boothe, Piotr J. Franaszczuk, Vasileios Maroulas

Abstract: The brain's spatial orientation system uses different neuron ensembles to aid in environment-based navigation. Two of the ways brains encode spatial information is through head direction cells and grid cells. Brains use head direction cells to determine orientation whereas grid cells consist of layers of decked neurons that overlay to provide environment-based navigation. These neurons fire in ens… ▽ More The brain's spatial orientation system uses different neuron ensembles to aid in environment-based navigation. Two of the ways brains encode spatial information is through head direction cells and grid cells. Brains use head direction cells to determine orientation whereas grid cells consist of layers of decked neurons that overlay to provide environment-based navigation. These neurons fire in ensembles where several neurons fire at once to activate a single head direction or grid. We want to capture this firing structure and use it to decode head direction grid cell data. Understanding, representing, and decoding these neural structures requires models that encompass higher order connectivity, more than the 1-dimensional connectivity that traditional graph-based models provide. To that end, in this work, we develop a topological deep learning framework for neural spike train decoding. Our framework combines unsupervised simplicial complex discovery with the power of deep learning via a new architecture we develop herein called a simplicial convolutional recurrent neural network. Simplicial complexes, topological spaces that use not only vertices and edges but also higher-dimensional objects, naturally generalize graphs and capture more than just pairwise relationships. Additionally, this approach does not require prior knowledge of the neural activity beyond spike counts, which removes the need for similarity measurements. The effectiveness and versatility of the simplicial convolutional neural network is demonstrated on head direction and trajectory prediction via head direction and grid cell datasets. △ Less

Submitted 6 September, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2209.02768 [pdf]

Prospectively accelerated dynamic speech MRI at 3 Tesla using a self-navigated spiral based manifold regularized scheme

Authors: Rushdi Zahid Rusho, Abdul Haseeb Ahmed, Stanley Kruger, Wahidul Alam, David Meyer, David Howard, Brad Story, Mathews Jacob, Sajan Goud Lingala

Abstract: This work proposes a self-navigated variable density spiral(VDS) based manifold regularization scheme to prospectively improve dynamic speech MRI at 3T. Short readout 1.3ms spirals were used to minimize off-resonance. A custom 16-channel speech coil was used for improved parallel imaging of vocal tract. The manifold model leveraged similarities between frames sharing similar speech postures withou… ▽ More This work proposes a self-navigated variable density spiral(VDS) based manifold regularization scheme to prospectively improve dynamic speech MRI at 3T. Short readout 1.3ms spirals were used to minimize off-resonance. A custom 16-channel speech coil was used for improved parallel imaging of vocal tract. The manifold model leveraged similarities between frames sharing similar speech postures without explicit motion binning. The self-navigating capability of VDS was leveraged to learn the Laplacian matrix of the manifold. Reconstruction was posed as a SENSE-based non-local soft weighted temporal regularization scheme. Our approach was compared against view-sharing, low-rank, finite difference, extra-dimension-based sparsity reconstruction constraints. Under-sampling experiments were conducted on five volunteers performing repetitive and arbitrary speaking tasks at different speaking rates. Quantitative evaluation in terms of mean square error over moving edges were performed in a retrospectively under-sampled data. For prospective under-sampling, blinded image quality evaluation in the categories of alias artifacts, spatial blurring, and temporal blurring were performed by three voice research experts. Region of interest analysis at articulator boundaries were performed to assess articulatory motion. Our scheme provided improved reconstruction over the others. With prospective under-sampling, a spatial resolution of 2.4mm2/pixel and a temporal resolution of 17.4 ms/frame for single slice imaging, and 52.2 ms/frame for 3-slice imaging were achieved. We demonstrated implicit motion binning by analyzing the mechanics of the Laplacian matrix. Our method demonstrated superior image quality scores in reducing spatial and temporal blurring. While it exhibited faint alias artifacts similar to temporal finite-difference, it provided statistically significant improvements over remaining constraints. △ Less

Submitted 1 May, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 32 pages, 10 figures

arXiv:2111.05663 [pdf, other]

The Impact of Changes in Resolution on the Persistent Homology of Images

Authors: Teresa Heiss, Sarah Tymochko, Brittany Story, Adélie Garin, Hoa Bui, Bea Bleile, Vanessa Robins

Abstract: Digital images enable quantitative analysis of material properties at micro and macro length scales, but choosing an appropriate resolution when acquiring the image is challenging. A high resolution means longer image acquisition and larger data requirements for a given sample, but if the resolution is too low, significant information may be lost. This paper studies the impact of changes in resolu… ▽ More Digital images enable quantitative analysis of material properties at micro and macro length scales, but choosing an appropriate resolution when acquiring the image is challenging. A high resolution means longer image acquisition and larger data requirements for a given sample, but if the resolution is too low, significant information may be lost. This paper studies the impact of changes in resolution on persistent homology, a tool from topological data analysis that provides a signature of structure in an image across all length scales. Given prior information about a function, the geometry of an object, or its density distribution at a given resolution, we provide methods to select the coarsest resolution yielding results within an acceptable tolerance. We present numerical case studies for an illustrative synthetic example and samples from porous materials where the theoretical bounds are unknown. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: accepted for the IEEE Big Data 2021 workshop: Applications of Topological Data Analysis to 'Big Data'

MSC Class: 68T09; 68U03; 55N31; 62R40; 54H30; 55-08

arXiv:2011.00617 [pdf, other]

Support vector machines and Radon's theorem

Authors: Henry Adams, Elin Farnell, Brittany Story

Abstract: A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divid… ▽ More A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the convex hulls of the positive and negative support vectors are projected onto a separating hyperplane, then the projections intersect if and only if the hyperplane is optimal. Further, with a particular type of general position, we show that (a) the projected convex hulls of the support vectors intersect in exactly one point, (b) the support vectors are stable under perturbation, (c) there are at most $n+1$ support vectors, and (d) every number of support vectors from 2 up to $n+1$ is possible. Finally, we perform computer simulations studying the expected number of support vectors, and their configurations, for randomly generated data. We observe that as the distance between classes of points increases for this type of randomly generated data, configurations with fewer support vectors become more likely. △ Less

Submitted 16 September, 2022; v1 submitted 1 November, 2020; originally announced November 2020.

Showing 1–5 of 5 results for author: Story, B