Search | arXiv e-print repository

Towards Decoding Brain Activity During Passive Listening of Speech

Authors: Milán András Fodor, Tamás Gábor Csapó, Frigyes Viktor Arthur

Abstract: The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and,… ▽ More The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods. The goal is to aid the advancement of brain-computer interface (BCI) technology for speech synthesis, and, hopefully, to provide an additional perspective on the cognitive processes of speech perception. This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech. This angle opened up a complex perspective, potentially allowing us to study more sophisticated neural patterns. Leveraging the power of deep learning models, the research aimed to establish a connection between these intricate neural activities and the corresponding speech sounds. Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception. Our current efforts can serve as a foundation, and we are optimistic about the potential of expanding and improving upon this work to move closer towards more advanced BCIs, better understanding of processes underlying perceived speech and its relation to spoken speech. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 27 pages, 7 figures

arXiv:2306.05374 [pdf, other]

doi 10.21437/Interspeech.2023-40

Towards Ultrasound Tongue Image prediction from EEG during speech production

Authors: Tamás Gábor Csapó, Frigyes Viktor Arthur, Péter Nagy, Ádám Boncz

Abstract: Previous initial research has already been carried out to propose speech-based BCI using brain signals (e.g. non-invasive EEG and invasive sEEG / ECoG), but there is a lack of combined methods that investigate non-invasive brain, articulation, and speech signals together and analyze the cognitive processes in the brain, the kinematics of the articulatory movement and the resulting speech signal. I… ▽ More Previous initial research has already been carried out to propose speech-based BCI using brain signals (e.g. non-invasive EEG and invasive sEEG / ECoG), but there is a lack of combined methods that investigate non-invasive brain, articulation, and speech signals together and analyze the cognitive processes in the brain, the kinematics of the articulatory movement and the resulting speech signal. In this paper, we describe our multimodal (electroencephalography, ultrasound tongue imaging, and speech) analysis and synthesis experiments, as a feasibility study. We extend the analysis of brain signals recorded during speech production with ultrasound-based articulation data. From the brain signal measured with EEG, we predict ultrasound images of the tongue with a fully connected deep neural network. The results show that there is a weak but noticeable relationship between EEG and ultrasound tongue images, i.e. the network can differentiate articulated speech and neutral tongue position. △ Less

Submitted 18 October, 2023; v1 submitted 22 May, 2023; originally announced June 2023.

Comments: accepted at Interspeech 2023

Journal ref: Proceedings of Interspeech 2023

arXiv:2104.14467 [pdf, other]

Towards a practical lip-to-speech conversion system using deep neural networks and mobile application frontend

Authors: Frigyes Viktor Arthur, Tamás Gábor Csapó

Abstract: Articulatory-to-acoustic (forward) map** is a technique to predict speech using various articulatory acquisition techniques as input (e.g. ultrasound tongue imaging, MRI, lip video). The advantage of lip video is that it is easily available and affordable: most modern smartphones have a front camera. There are already a few solutions for lip-to-speech synthesis, but they mostly concentrate on of… ▽ More Articulatory-to-acoustic (forward) map** is a technique to predict speech using various articulatory acquisition techniques as input (e.g. ultrasound tongue imaging, MRI, lip video). The advantage of lip video is that it is easily available and affordable: most modern smartphones have a front camera. There are already a few solutions for lip-to-speech synthesis, but they mostly concentrate on offline training and inference. In this paper, we propose a system built from a backend for deep neural network training and inference and a fronted as a form of a mobile application. Our initial evaluation shows that the scenario is feasible: a top-5 classification accuracy of 74% is combined with feedback from the mobile application user, making sure that the speaking impaired might be able to communicate with this solution. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: 10 pages, 6 figures

Showing 1–3 of 3 results for author: Arthur, F V