Search | arXiv e-print repository

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

Abstract: Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or… ▽ More Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 5 pages

arXiv:2103.02087 [pdf, other]

Deep J-Sense: Accelerated MRI Reconstruction via Unrolled Alternating Optimization

Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik, Jonathan I. Tamir

Abstract: Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are p… ▽ More Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are poor or when the scan parameters differ from the training conditions. Here we introduce Deep J-Sense as a deep learning approach that builds on unrolled alternating minimization and increases robustness: our algorithm refines both the magnetization (image) kernel and the coil sensitivity maps. Experimental results on a subset of the knee fastMRI dataset show that this increases reconstruction performance and provides a significant degree of robustness to varying acceleration factors and calibration region sizes. △ Less

Submitted 11 April, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2103.00383 [pdf, other]

Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech Recognition

Authors: Gautam Krishna, Mason Carnahan, Shilpa Shamapant, Yashitha Surendranath, Saumya Jain, Arundhati Ghosh, Co Tran, Jose del R Millan, Ahmed H Tewfik

Abstract: In this paper, we propose a deep learning-based algorithm to improve the performance of automatic speech recognition (ASR) systems for aphasia, apraxia, and dysarthria speech by utilizing electroencephalography (EEG) features recorded synchronously with aphasia, apraxia, and dysarthria speech. We demonstrate a significant decoding performance improvement by more than 50\% during test time for isol… ▽ More In this paper, we propose a deep learning-based algorithm to improve the performance of automatic speech recognition (ASR) systems for aphasia, apraxia, and dysarthria speech by utilizing electroencephalography (EEG) features recorded synchronously with aphasia, apraxia, and dysarthria speech. We demonstrate a significant decoding performance improvement by more than 50\% during test time for isolated speech recognition task and we also provide preliminary results indicating performance improvement for the more challenging continuous speech recognition task by utilizing EEG features. The results presented in this paper show the first step towards demonstrating the possibility of utilizing non-invasive neural signals to design a real-time robust speech prosthetic for stroke survivors recovering from aphasia, apraxia, and dysarthria. Our aphasia, apraxia, and dysarthria speech-EEG data set will be released to the public to help further advance this interesting and crucial research. △ Less

Submitted 17 July, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: Accepted to IEEE EMBC 2021

arXiv:2101.04269 [pdf, other]

Pneumonia Detection on Chest X-ray using Radiomic Features and Contrastive Learning

Authors: Yan Han, Chongyan Chen, Ahmed H Tewfik, Ying Ding, Yifan Peng

Abstract: Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates… ▽ More Chest X-ray becomes one of the most common medical diagnoses due to its noninvasiveness. The number of chest X-ray images has skyrocketed, but reading chest X-rays still have been manually performed by radiologists, which creates huge burnouts and delays. Traditionally, radiomics, as a subfield of radiology that can extract a large number of quantitative features from medical images, demonstrates its potential to facilitate medical imaging diagnosis before the deep learning era. With the rise of deep learning, the explainability of deep neural networks on chest X-ray diagnosis remains opaque. In this study, we proposed a novel framework that leverages radiomics features and contrastive learning to detect pneumonia in chest X-ray. Experiments on the RSNA Pneumonia Detection Challenge dataset show that our model achieves superior results to several state-of-the-art models (> 10% in F1-score) and increases the model's interpretability. △ Less

Submitted 4 May, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted for ISBI 2021

arXiv:2012.12843 [pdf, other]

EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio Estimation and Quantization

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck… ▽ More In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck method: simpler algorithms admit smaller bottlenecks when representing their solution. This motivates us to propose a two-stage algorithm that uses LLR compression as a pretext task for estimation and is focused on low-latency, high-performance implementations via deep neural networks. We carry out extensive experimental evaluation and demonstrate that our single architecture achieves state-of-the-art results on both tasks when compared to previous methods, with gains in quantization efficiency as high as $20\%$ and reduced estimation latency by up to $60\%$ when measured on general purpose and graphical processing units (GPU). In particular, our approach reduces the GPU inference latency by more than two times in several multiple-input multiple-output (MIMO) configurations. Finally, we demonstrate that our scheme is robust to distributional shifts and retains a significant part of its performance when evaluated on 5G channel models, as well as channel estimation errors. △ Less

Submitted 3 May, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2008.07621 [pdf, other]

Speech Recognition using EEG signals recorded using dry electrodes

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Morgan M Hagood, Ahmed H Tewfik

Abstract: In this paper, we demonstrate speech recognition using electroencephalography (EEG) signals obtained using dry electrodes on a limited English vocabulary consisting of three vowels and one word using a deep learning model. We demonstrate a test accuracy of 79.07 percent on a subset vocabulary consisting of two English vowels. Our results demonstrate the feasibility of using EEG signals recorded us… ▽ More In this paper, we demonstrate speech recognition using electroencephalography (EEG) signals obtained using dry electrodes on a limited English vocabulary consisting of three vowels and one word using a deep learning model. We demonstrate a test accuracy of 79.07 percent on a subset vocabulary consisting of two English vowels. Our results demonstrate the feasibility of using EEG signals recorded using dry electrodes for performing the task of speech recognition. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2006.03638 [pdf, other]

Robust Face Verification via Disentangled Representations

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss… ▽ More We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss termand a disentangled generative model to sample negative pairs. Instead of randomlypairing two real images, we pair an image with its class-modified counterpart whilekee** its content (pose, head tilt, hair, etc.) intact. This enables us to efficientlysample hard negative pairs for the contrastive loss. We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks. △ Less

Submitted 23 June, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: Preprint

arXiv:2003.00007 [pdf, other]

Generating EEG features from Acoustic features

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

Abstract: In this paper we demonstrate predicting electroencephalograpgy (EEG) features from acoustic features using recurrent neural network (RNN) based regression model and generative adversarial network (GAN). We predict various types of EEG features from acoustic features. We compare our results with the previously studied problem on speech synthesis using EEG and our results demonstrate that EEG featur… ▽ More In this paper we demonstrate predicting electroencephalograpgy (EEG) features from acoustic features using recurrent neural network (RNN) based regression model and generative adversarial network (GAN). We predict various types of EEG features from acoustic features. We compare our results with the previously studied problem on speech synthesis using EEG and our results demonstrate that EEG features can be generated from acoustic features with lower root mean square error (RMSE), normalized RMSE values compared to generating acoustic features from EEG features (ie: speech synthesis using EEG) when tested using the same data sets. △ Less

Submitted 18 March, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

arXiv:2001.00501 [pdf, other]

EEG based Continuous Speech Recognition using Transformers

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we investigate continuous speech recognition using electroencephalography (EEG) features using recently introduced end-to-end transformer based automatic speech recognition (ASR) model. Our results demonstrate that transformer based model demonstrate faster training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models and better performance during inferenc… ▽ More In this paper we investigate continuous speech recognition using electroencephalography (EEG) features using recently introduced end-to-end transformer based automatic speech recognition (ASR) model. Our results demonstrate that transformer based model demonstrate faster training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models and better performance during inference time for smaller test set vocabulary but as we increase the vocabulary size, the performance of the RNN based models were better than transformer based model on a limited English vocabulary. △ Less

Submitted 5 May, 2020; v1 submitted 31 December, 2019; originally announced January 2020.

arXiv:1912.07730 [pdf, other]

Continuous Speech Recognition using EEG and Video

Authors: Gautam Krishna, Mason Carnahan, Co Tran, Ahmed H Tewfik

Abstract: In this paper we investigate whether electroencephalography (EEG) features can be used to improve the performance of continuous visual speech recognition systems. We implemented a connectionist temporal classification (CTC) based end-to-end automatic speech recognition (ASR) model for performing recognition. Our results demonstrate that EEG features are helpful in enhancing the performance of cont… ▽ More In this paper we investigate whether electroencephalography (EEG) features can be used to improve the performance of continuous visual speech recognition systems. We implemented a connectionist temporal classification (CTC) based end-to-end automatic speech recognition (ASR) model for performing recognition. Our results demonstrate that EEG features are helpful in enhancing the performance of continuous visual speech recognition systems. △ Less

Submitted 27 December, 2019; v1 submitted 16 December, 2019; originally announced December 2019.

Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.11610, arXiv:1911.04261

arXiv:1911.11610 [pdf, other]

Improving EEG based Continuous Speech Recognition

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

Abstract: In this paper we introduce various techniques to improve the performance of electroencephalography (EEG) features based continuous speech recognition (CSR) systems. A connectionist temporal classification (CTC) based automatic speech recognition (ASR) system was implemented for performing recognition. We introduce techniques to initialize the weights of the recurrent layers in the encoder of the C… ▽ More In this paper we introduce various techniques to improve the performance of electroencephalography (EEG) features based continuous speech recognition (CSR) systems. A connectionist temporal classification (CTC) based automatic speech recognition (ASR) system was implemented for performing recognition. We introduce techniques to initialize the weights of the recurrent layers in the encoder of the CTC model with more meaningful weights rather than with random weights and we make use of an external language model to improve the beam search during decoding time. We finally study the problem of predicting articulatory features from EEG features in this paper. △ Less

Submitted 23 December, 2019; v1 submitted 24 November, 2019; originally announced November 2019.

Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.04261, arXiv:1906.08871

arXiv:1911.04261 [pdf, other]

Voice Activity Detection in presence of background noise using EEG

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik

Abstract: In this paper we demonstrate that performance of voice activity detection (VAD) system operating in presence of background noise can be improved by concatenating acoustic input features with electroencephalography (EEG) features. We also demonstrate that VAD using only EEG features shows better performance than VAD using only acoustic features in presence of background noise. We implemented a recu… ▽ More In this paper we demonstrate that performance of voice activity detection (VAD) system operating in presence of background noise can be improved by concatenating acoustic input features with electroencephalography (EEG) features. We also demonstrate that VAD using only EEG features shows better performance than VAD using only acoustic features in presence of background noise. We implemented a recurrent neural network (RNN) based VAD system and we demonstrate our results for two different data sets recorded in presence of different noise conditions in this paper. We finally demonstrate the ability to predict whether a person wish to continue speaking a sentence or not from EEG features. △ Less

Submitted 14 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1906.08871, arXiv:1909.09132

arXiv:1909.09132 [pdf, other]

Spoken Speech Enhancement using EEG

Authors: Gautam Krishna, Co Tran, Yan Han, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we demonstrate spoken speech enhancement using electroencephalography (EEG) signals using a generative adversarial network (GAN) based model, gated recurrent unit (GRU) regression based model, temporal convolutional network (TCN) regression model and finally using a mixed TCN GRU regression model. We compare our EEG based speech enhancement results with traditional log minimum mean… ▽ More In this paper we demonstrate spoken speech enhancement using electroencephalography (EEG) signals using a generative adversarial network (GAN) based model, gated recurrent unit (GRU) regression based model, temporal convolutional network (TCN) regression model and finally using a mixed TCN GRU regression model. We compare our EEG based speech enhancement results with traditional log minimum mean-square error (MMSE) speech enhancement algorithm and our proposed methods demonstrate significant improvement in speech enhancement quality compared to the traditional method. Our overall results demonstrate that EEG features can be used to clean speech recorded in presence of background noise. To the best of our knowledge this is the first time a spoken speech enhancement is demonstrated using EEG features recorded in parallel with spoken speech. △ Less

Submitted 19 April, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

arXiv:1908.05743 [pdf, other]

State-of-the-art Speech Recognition using EEG and Towards Decoding of Speech Spectrum From EEG

Authors: Gautam Krishna, Yan Han, Co Tran, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we first demonstrate continuous noisy speech recognition using electroencephalography (EEG) signals on English vocabulary using different types of state of the art end-to-end automatic speech recognition (ASR) models, we further provide results obtained using EEG data recorded under different experimental conditions. We finally demonstrate decoding of speech spectrum from EEG signals… ▽ More In this paper we first demonstrate continuous noisy speech recognition using electroencephalography (EEG) signals on English vocabulary using different types of state of the art end-to-end automatic speech recognition (ASR) models, we further provide results obtained using EEG data recorded under different experimental conditions. We finally demonstrate decoding of speech spectrum from EEG signals using a long short term memory (LSTM) based regression model and Generative Adversarial Network (GAN) based model. Our results demonstrate the feasibility of using EEG signals for continuous noisy speech recognition under different experimental conditions and we provide preliminary results for synthesis of speech from EEG features. △ Less

Submitted 4 March, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

arXiv:1906.08871 [pdf, other]

Advancing Speech Recognition With No Speech Or With Noisy Speech

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we demonstrate end-to-end continuous speech recognition (CSR) using electroencephalography (EEG) signals with no speech signal as input. An attention model based automatic speech recognition (ASR) and connectionist temporal classification (CTC) based ASR systems were implemented for performing recognition. We further demonstrate CSR for noisy speech by fusing with EEG features. In this paper we demonstrate end-to-end continuous speech recognition (CSR) using electroencephalography (EEG) signals with no speech signal as input. An attention model based automatic speech recognition (ASR) and connectionist temporal classification (CTC) based ASR systems were implemented for performing recognition. We further demonstrate CSR for noisy speech by fusing with EEG features. △ Less

Submitted 14 March, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: Extended version of our accepted IEEE EUSIPCO 2019 paper with additional results for CTC model based recognition. arXiv admin note: substantial text overlap with arXiv:1906.08045, arXiv:1906.08044

arXiv:1906.08045 [pdf, other]

Speech Recognition With No Speech Or With Noisy Speech Beyond English

Authors: Gautam Krishna, Co Tran, Yan Han, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal… ▽ More In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal as input. We demonstrate our results using various EEG feature sets recently introduced in [1] as well as we propose a new deep learning architecture in this paper which can perform continuous speech recognition using raw EEG signals on limited joint English and Chinese vocabulary. △ Less

Submitted 26 February, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: arXiv admin note: text overlap with arXiv:1906.08871

arXiv:1906.08044 [pdf, other]

Robust End-to-End Speaker Verification Using EEG

Authors: Yan Han, Gautam Krishna, Co Tran, Mason Carnahan, Ahmed H Tewfik

Abstract: In this paper we demonstrate that performance of a speaker verification system can be improved by concatenating electroencephalography (EEG) signal features with speech signal features or only using EEG signal features. We use state-of-the-art end-to-end deep learning model for performing speaker verification and we demonstrate our results for noisy speech. Our results indicate that EEG signals ca… ▽ More In this paper we demonstrate that performance of a speaker verification system can be improved by concatenating electroencephalography (EEG) signal features with speech signal features or only using EEG signal features. We use state-of-the-art end-to-end deep learning model for performing speaker verification and we demonstrate our results for noisy speech. Our results indicate that EEG signals can improve the robustness of speaker verification systems, especially in noiser environment. △ Less

Submitted 9 June, 2020; v1 submitted 17 June, 2019; originally announced June 2019.

Comments: Accepted for EUSIPCO 2020

arXiv:1906.07849 [pdf, other]

Deep Learning-Based Quantization of L-Values for Gray-Coded Modulation

Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik

Abstract: In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-val… ▽ More In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-value, allowing the use of a weighted loss function that aims to more accurately reconstructs low magnitude inputs. Our method is shown to be competitive with state-of-the-art maximum mutual information quantization schemes, reducing the required memory footprint by a ratio of up to two and a loss of performance smaller than 0.1 dB with less than two effective bits per L-value or smaller than 0.04 dB with 2.25 effective bits. We experimentally show that our proposed method is a universal compression scheme in the sense that after training on an LDPC-coded Rayleigh fading scenario we can reuse the same network without further training on other channel models and codes while preserving the same performance benefits. △ Less

Submitted 9 May, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Submitted to IEEE Globecom 2019

arXiv:1903.04656 [pdf, other]

Deep Log-Likelihood Ratio Quantization

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a l… ▽ More In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a latent space with dimension equal to the number of sufficient statistics required to recover the inputs - equal to three in this case - while the decoder aims to reconstruct a noisy version of the latent representation with the purpose of modeling quantization effects in a differentiable way. Simulation results show that, when applied to a standard rate-1/2 low-density parity-check (LDPC) code, a finite precision compression factor of nearly three times is achieved when storing an entire codeword, with an incurred loss of performance lower than 0.1 dB compared to straightforward scalar quantization of the log-likelihood ratios. △ Less

Submitted 9 May, 2021; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: Accepted for publication at EUSIPCO 2019. Camera-ready version

arXiv:1903.00739 [pdf, other]

Speech Recognition with no speech or with noisy speech

Authors: Gautam Krishna, Co Tran, Jianguo Yu, Ahmed H Tewfik

Abstract: The performance of automatic speech recognition systems(ASR) degrades in the presence of noisy speech. This paper demonstrates that using electroencephalography (EEG) can help automatic speech recognition systems overcome performance loss in the presence of noise. The paper also shows that distillation training of automatic speech recognition systems using EEG features will increase their performa… ▽ More The performance of automatic speech recognition systems(ASR) degrades in the presence of noisy speech. This paper demonstrates that using electroencephalography (EEG) can help automatic speech recognition systems overcome performance loss in the presence of noise. The paper also shows that distillation training of automatic speech recognition systems using EEG features will increase their performance. Finally, we demonstrate the ability to recognize words from EEG with no speech signal on a limited English vocabulary with high accuracy. △ Less

Submitted 2 March, 2019; originally announced March 2019.

Comments: Accepted for ICASSP 2019

Showing 1–20 of 20 results for author: Tewfik, A H