Search | arXiv e-print repository

arXiv:2405.19426 [pdf, other]

Deep Learning for Assessment of Oral Reading Fluency

Authors: Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao

Abstract: Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accu… ▽ More Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency. △ Less

Submitted 1 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2110.14273 [pdf, other]

Deep Learning For Prominence Detection In Children's Read Speech

Authors: Mithilesh Vaidya, Kamini Sabu, Preeti Rao

Abstract: The detection of perceived prominence in speech has attracted approaches ranging from the design of linguistic knowledge-based acoustic features to the automatic feature learning from suprasegmental attributes such as pitch and intensity contours. We present here, in contrast, a system that operates directly on segmented speech waveforms to learn features relevant to prominent word detection for c… ▽ More The detection of perceived prominence in speech has attracted approaches ranging from the design of linguistic knowledge-based acoustic features to the automatic feature learning from suprasegmental attributes such as pitch and intensity contours. We present here, in contrast, a system that operates directly on segmented speech waveforms to learn features relevant to prominent word detection for children's oral fluency assessment. The chosen CRNN (convolutional recurrent neural network) framework, incorporating both word-level features and sequence information, is found to benefit from the perceptually motivated SincNet filters as the first convolutional layer. We further explore the benefits of the linguistic association between the prosodic events of phrase boundary and prominence with different multi-task architectures. Matching the previously reported performance on the same dataset of a random forest ensemble predictor trained on carefully chosen hand-crafted acoustic features, we evaluate further the possibly complementary information from hand-crafted acoustic and pre-trained lexical features. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: Under review at ICASSP 2022. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2104.05488 [pdf, other]

CNN Encoding of Acoustic Parameters for Prominence Detection

Authors: Kamini Sabu, Mithilesh Vaidya, Preeti Rao

Abstract: Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexi… ▽ More Expressive reading, considered the defining attribute of oral reading fluency, comprises the prosodic realization of phrasing and prominence. In the context of evaluating oral reading, it helps to establish the speaker's comprehension of the text. We consider a labeled dataset of children's reading recordings for the speaker-independent detection of prominent words using acoustic-prosodic and lexico-syntactic features. A previous well-tuned random forest ensemble predictor is replaced by an RNN sequence classifier to exploit potential context dependency across the longer utterance. Further, deep learning is applied to obtain word-level features from low-level acoustic contours of fundamental frequency, intensity and spectral shape in an end-to-end fashion. Performance comparisons are presented across the different feature types and across different feature learning architectures for prominent word prediction to draw insights wherever possible. △ Less

Submitted 27 January, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: 5 pages, 2 figures, 6 tables, Submitted to INTERSPEECH 2021

arXiv:2007.03136 [pdf, other]

Electromyogram (EMG) Removal by Adding Sources of EMG (ERASE) -- A novel ICA-based algorithm for removing myoelectric artifacts from EEG -- Part 2

Authors: Yongcheng Li, Po T. Wang, Mukta P. Vaidya, Charles Y. Liu, Marc W. Slutzky, An H. Do

Abstract: Extraction of the movement-related high-gamma (80 - 160 Hz) in electroencephalogram (EEG) from traumatic brain injury (TBI) patients who have had hemicraniectomies, remains challenging due to a confounding bandwidth overlap with surface electromyogram (EMG) artifacts related to facial and head movements. In part 1, we described an augmented independent component analysis (ICA) approach for removal… ▽ More Extraction of the movement-related high-gamma (80 - 160 Hz) in electroencephalogram (EEG) from traumatic brain injury (TBI) patients who have had hemicraniectomies, remains challenging due to a confounding bandwidth overlap with surface electromyogram (EMG) artifacts related to facial and head movements. In part 1, we described an augmented independent component analysis (ICA) approach for removal of EMG artifacts from EEG, and referred to as EMG Reduction by Adding Sources of EMG (ERASE). Here, we tested ERASE on EEG recorded from six TBI patients with hemicraniectomies while they performed a thumb flexion task. ERASE removed a mean of 52 +/- 12% (mean +/- S.E.M) (maximum 73%) of EMG artifacts. In contrast, conventional ICA removed a mean of 27 +/- 19\% (mean +/- S.E.M) of EMG artifacts from EEG. In particular, high-gamma synchronization was significantly improved in the contralateral hand motor cortex area within the hemicraniectomy site after ERASE was applied. We computed fractal dimension (FD) of EEG high-gamma on each channel. We found relative FD of high-gamma over hemicraniectomy after applying ERASE were strongly correlated to the amplitude of finger flexion force. Results showed that significant correlation coefficients across the electrodes related to thumb flexion averaged 0.76, while the coefficients across the homologous electrodes in non-hemicraniectomy areas were nearly 0. Across all subjects, an average of 83% of electrodes significantly correlated with force was located in the hemicraniectomy areas after applying ERASE. After conventional ICA, only 19% of electrodes with significant correlations were located in the hemicraniectomy. These results indicated that the new approach isolated electrophysiological features during finger motor activation while selectively removing confounding EMG artifacts. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:2007.03130 [pdf, other]

Electromyogram (EMG) Removal by Adding Sources of EMG (ERASE) -- A novel ICA-based algorithm for removing myoelectric artifacts from EEG -- Part 1

Authors: Yongcheng Li, Po T. Wang, Mukta P. Vaidya, Charles Y. Liu, Marc W. Slutzky, An H. Do

Abstract: Electroencephalographic (EEG) recordings are often contaminated by electromyographic (EMG) artifacts, especially when recording during movement. Existing methods to remove EMG artifacts include independent component analysis (ICA), and other high-order statistical methods. However, these methods can not effectively remove most of EMG artifacts. Here, we proposed a modified ICA model for EMG artifa… ▽ More Electroencephalographic (EEG) recordings are often contaminated by electromyographic (EMG) artifacts, especially when recording during movement. Existing methods to remove EMG artifacts include independent component analysis (ICA), and other high-order statistical methods. However, these methods can not effectively remove most of EMG artifacts. Here, we proposed a modified ICA model for EMG artifacts removal in the EEG, which is called EMG Removal by Adding Sources of EMG (ERASE). In this new approach, additional channels of real EMG from neck and head muscles (reference artifacts) were added as inputs to ICA in order to "force" the most power from EMG artifacts into a few independent components (ICs). The ICs containing EMG artifacts (the "artifact ICs") were identified and rejected using an automated procedure. Simulation results showed ERASE removed EMG artifacts from EEG significantly more effectively than conventional ICA. Subsequently, EEG was collected from 8 healthy participants while they moved their hands to test the realistic efficacy of this approach. Results showed that ERASE successfully removed EMG artifacts (on average, about 75% of EMG artifacts were removed when using real EMGs as reference artifacts) while preserving the expected EEG features related to movement. We also tested the ERASE procedure using simulated EMGs as reference artifacts (about 63% of EMG artifacts removed). Compared to conventional ICA, ERASE removed on average 26% more EMG artifacts from EEG. These results indicate that using additional real or simulated EMG sources can increase the effectiveness of ICA in removing EMG artifacts from EEG. Combined with automated artifact IC rejection, ERASE also minimizes potential user bias. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Showing 1–5 of 5 results for author: Vaidya, M