Search | arXiv e-print repository

Low-dimensional representation of infant and adult vocalization acoustics

Authors: Silvia Pagliarini, Sara Schneider, Christopher T. Kello, Anne S. Warlaumont

Abstract: During the first years of life, infant vocalizations change considerably, as infants develop the vocalization skills that enable them to produce speech sounds. Characterizations based on specific acoustic features, protophone categories, or phonetic transcription are able to provide a representation of the sounds infants make at different ages and in different contexts but do not fully describe ho… ▽ More During the first years of life, infant vocalizations change considerably, as infants develop the vocalization skills that enable them to produce speech sounds. Characterizations based on specific acoustic features, protophone categories, or phonetic transcription are able to provide a representation of the sounds infants make at different ages and in different contexts but do not fully describe how sounds are perceived by listeners, can be inefficient to obtain at large scales, and are difficult to visualize in two dimensions without additional statistical processing. Machine-learning-based approaches provide the opportunity to complement these characterizations with purely data-driven representations of infant sounds. Here, we use spectral features extraction and unsupervised machine learning, specifically Uniform Manifold Approximation (UMAP), to obtain a novel 2-dimensional spatial representation of infant and caregiver vocalizations extracted from day-long home recordings. UMAP yields a continuous and well-distributed space conducive to certain analyses of infant vocal development. For instance, we found that the dispersion of infant vocalization acoustics within the 2-D space over a day increased from 3 to 9 months, and then decreased from 9 to 18 months. The method also permits analysis of similarity between infant and adult vocalizations, which also shows changes with infant age. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Under review at Interspeech 2022

arXiv:2202.03265 [pdf, other]

Image-based eeg classification of brain responses to song recordings

Authors: Adolfo G. Ramirez-Aristizabal, Mohammad K. Ebrahimpour, Christopher T. Kello

Abstract: Classifying EEG responses to naturalistic acoustic stimuli is of theoretical and practical importance, but standard approaches are limited by processing individual channels separately on very short sound segments (a few seconds or less). Recent developments have shown classification for music stimuli (~2 mins) by extracting spectral components from EEG and using convolutional neural networks (CNNs… ▽ More Classifying EEG responses to naturalistic acoustic stimuli is of theoretical and practical importance, but standard approaches are limited by processing individual channels separately on very short sound segments (a few seconds or less). Recent developments have shown classification for music stimuli (~2 mins) by extracting spectral components from EEG and using convolutional neural networks (CNNs). This paper proposes an efficient method to map raw EEG signals to individual songs listened for end-to-end classification. EEG channels are treated as a dimension of a [Channel x Sample] image tile, and images are classified using CNNs. Our experimental results (88.7%) compete with state-of-the-art methods (85.0%), yet our classification task is more challenging by processing longer stimuli that were similar to each other in perceptual quality, and were unfamiliar to participants. We also adopt a transfer learning scheme using a pre-trained ResNet-50, confirming the effectiveness of transfer learning despite image domains unrelated from each other. △ Less

Submitted 31 January, 2022; originally announced February 2022.

Comments: 6 pages, 2 figures, 3 tables, 2 equations

ACM Class: I.4.10

arXiv:2102.04088 [pdf, other]

doi 10.1140/epjc/s10052-021-09292-5

Study of energy response and resolution of the ATLAS Tile Calorimeter to hadrons of energies from 16 to 30 GeV

Authors: Jalal Abdallah, Stylianos Angelidakis, Giorgi Arabidze, Nikolay Atanov, Johannes Bernhard, Romeo Bonnefoy, Jonathan Bossio, Ryan Bouabid, Fernando Carrio, Tomas Davidek, Michal Dubovsky, Luca Fiorini, Francisco Brandan Garcia Aparisi, Tancredi Carli, Alexander Gerbershagen, Hazal Goksu, Haleh Hadavand, Siarhei Harkusha, Dingane Hlaluku, Michael James Hibbard, Kevin Hildebrand, Juansher Jejelava, Andrey Kamenshchikov, Stergios Kazakos, Tomas Kello , et al. (46 additional authors not shown)

Abstract: Three spare modules of the ATLAS Tile Calorimeter were exposed to test beams from the Super Proton Synchrotron accelerator at CERN in 2017. The measurements of the energy response and resolution of the detector to positive pions and kaons and protons with energy in the range 16 to 30 GeV are reported. The results have uncertainties of few percent. They were compared to the predictions of the Geant… ▽ More Three spare modules of the ATLAS Tile Calorimeter were exposed to test beams from the Super Proton Synchrotron accelerator at CERN in 2017. The measurements of the energy response and resolution of the detector to positive pions and kaons and protons with energy in the range 16 to 30 GeV are reported. The results have uncertainties of few percent. They were compared to the predictions of the Geant4-based simulation program used in ATLAS to estimate the response of the detector to proton-proton events at Large Hadron Collider. The determinations obtained using experimental and simulated data agree within the uncertainties. △ Less

Submitted 8 February, 2021; originally announced February 2021.

arXiv:2005.12412 [pdf, other]

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations

Authors: Mohammad K. Ebrahimpour, Sara Schneider, David C. Noelle, Christopher T. Kello

Abstract: Acoustic analyses of infant vocalizations are valuable for research on speech development as well as applications in sound classification. Previous studies have focused on measures of acoustic features based on theories of speech processing, such spectral and cepstrum-based analyses. More recently, end-to-end models of deep learning have been developed to take raw speech signals (acoustic waveform… ▽ More Acoustic analyses of infant vocalizations are valuable for research on speech development as well as applications in sound classification. Previous studies have focused on measures of acoustic features based on theories of speech processing, such spectral and cepstrum-based analyses. More recently, end-to-end models of deep learning have been developed to take raw speech signals (acoustic waveforms) as inputs and convolutional neural network layers to learn representations of speech sounds based on classification tasks. We applied a recent end-to-end model of sound classification to analyze a large-scale database of labeled infant and adult vocalizations recorded in natural settings outside the lab with no control over recording conditions. The model learned basic classifications like infant versus adult vocalizations, infant speech-related versus non-speech vocalizations, and canonical versus non-canonical babbling. The model was trained on recordings of infants ranging from 3 to 18 months of age, and classification accuracy changed with age as speech became more distinct and babbling became more speech-like. Further work is needed to validate and explore the model and dataset, but our results show how deep learning can be used to measure and investigate speech acquisition and development, with potential applications in speech pathology and infant monitoring. △ Less

Submitted 25 May, 2020; originally announced May 2020.

arXiv:2005.12195 [pdf, other]

End-to-End Auditory Object Recognition via Inception Nucleus

Authors: Mohammad K. Ebrahimpour, Timothy Shea, Andreea Danielescu, David C. Noelle, Christopher T. Kello

Abstract: Machine learning approaches to auditory object recognition are traditionally based on engineered features such as those derived from the spectrum or cepstrum. More recently, end-to-end classification systems in image and auditory recognition systems have been developed to learn features jointly with classification and result in improved classification accuracy. In this paper, we propose a novel en… ▽ More Machine learning approaches to auditory object recognition are traditionally based on engineered features such as those derived from the spectrum or cepstrum. More recently, end-to-end classification systems in image and auditory recognition systems have been developed to learn features jointly with classification and result in improved classification accuracy. In this paper, we propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels. Our network includes an "inception nucleus" that optimizes the size of convolutional filters on the fly that results in reducing engineering efforts dramatically. Classification results compared favorably against current state-of-the-art approaches, besting them by 10.4 percentage points on the Urbansound8k dataset. Analyses of learned representations revealed that filters in the earlier hidden layers learned wavelet-like transforms to extract features that were informative for classification. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: Published In proceedings of ICASSP 2020

arXiv:1707.02932 [pdf, other]

Complexity of eye fixation duration time series in reading of Persian texts: A multifractal detrended fluctuation analysis

Authors: Mohammad Sharifi, Hamed Farahani, Farhad Shahbazi, Masood Sharifi, Christofer T. Kello, Marzieh Zare

Abstract: There is growing evidence that cognitive processes may have fractal structures as a signature of complexity. It is an an ongoing topic of research to study the class of complexity and how it may differ as a function of cognitive variables. Here, we explore the eye movement trajectories generated during reading different Persian texts. Features of eye movement trajectories were recorded during read… ▽ More There is growing evidence that cognitive processes may have fractal structures as a signature of complexity. It is an an ongoing topic of research to study the class of complexity and how it may differ as a function of cognitive variables. Here, we explore the eye movement trajectories generated during reading different Persian texts. Features of eye movement trajectories were recorded during reading Persian texts using an eye tracker. We show that fixation durations, as the main components of eye movements reflecting cognitive processing, exhibits multifractal behavior. This indicates that multiple exponents are needed to capture the neural and cognitive processes involved in decoding symbols to derive meaning. We test whether multifractal behavior varies as a function of two different fonts, familiarity of the text for readers, and reading silently or aloud, and goal-oriented versus non-goal-oriented reading. We find that, while mean fixation duration is affected by some of these factors, the multifractal pattern in time series of eye fixation durations did not change significantly. Our results suggest that multifractal dynamics may be intrinsic to the reading process. △ Less

Submitted 10 July, 2017; originally announced July 2017.

Comments: 7 pages, 4 figures, 4 tables

Showing 1–6 of 6 results for author: Kello, T