License: CC BY 4.0
arXiv:2402.10005v1 [cs.SD] 18 Dec 2023

ML-ASPA: A Contemplation of Machine Learning-based Acoustic Signal Processing Analysis for Sounds, & Strains Emerging Technology

Ratul Ali, Aktarul Islam, Md. Shohel Rana, Saila Nasrin, and Sohel Afzal Shajol
Department of Computer Science and Engineering
Uttara University (UU), Dhaka, Bangladesh
University of Rajshahi (RU), Rajshahi, Bangladesh
University of Rajshahi (RU), Rajshahi, Bangladesh
Daffodil International University (DIU), Dhaka, Bangladesh
University of Development Alternative (UODA), Dhaka, Bangladesh
{abdurrahimratulalikhan, aktarul857, msr.cse.ru, sailanasrin92, sohelafzalshajol}@gmail.com

Professor Dr. A.H.M. Saifullah Sadi
Department of Computer Science and Engineering
Uttara University (UU), Dhaka, Bangladesh
{[email protected]}
Abstract

Acoustic data serves as a fundamental cornerstone in advancing scientific and engineering understanding across diverse disciplines, spanning biology, communications, and ocean and Earth science. This inquiry meticulously explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning. ML, comprising an extensive array of statistical techniques, proves indispensable for autonomously discerning and leveraging patterns within data. In contrast to traditional acoustics and signal processing, ML adopts a data-driven approach, unveiling intricate relationships between features and desired labels or actions, as well as among features themselves, given ample training data. The application of ML to expansive sets of training data facilitates the discovery of models elucidating complex acoustic phenomena such as human speech and reverberation. The dynamic evolution of ML in acoustics yields compelling results and holds substantial promise for the future. The advent of electronic stethoscopes and analogous recording and data logging devices has expanded the application of acoustic signal processing concepts to the analysis of bowel sounds. This paper critically reviews existing literature on acoustic signal processing for bowel sound analysis, outlining fundamental approaches and applicable machine learning principles. It chronicles historical progress in signal processing techniques that have facilitated the extraction of valuable information from bowel sounds, emphasizing advancements in noise reduction, segmentation, signal enhancement, feature extraction, sound localization, and machine learning techniques. This underscores the evolution in bowel sound analysis. The integration of advanced acoustic signal processing, coupled with innovative machine learning methods and artificial intelligence, emerges as a promising avenue for enhancing the interpretation of acoustic information emanating from the bowel. This study initiates by introducing ML and subsequently delineates its developments within five key acoustics research domains: speech processing, ocean acoustics, bioacoustics, environmental acoustics, and Bowel Sound Analysis in everyday scenes.

Index Terms:
Acoustic Data, Machine Learning, Deep Learning, Signal Processing, Data-Driven Approach, Speech Processing, Reverberation, Electronic Stethoscopes, Bowel Sound Analysis, Bioacoustics, Environmental Acoustics, Noise Reduction, Segmentation, Feature Extraction, Artificial Intelligence

I Introduction

Acoustic data play a pivotal role in various scientific domains, including the interpretation of human speech and animal vocalizations, ocean source localization, and imaging geophysical structures in the ocean. Despite the broad applications, challenges such as data corruption, missing measurements, reverberation, and large data volumes complicate the analysis. Machine learning (ML) techniques have emerged as a powerful solution to address these challenges, offering automated data processing and pattern recognition capabilities. ML in acoustics is a rapidly evolving field, with significant potential to overcome intricate acoustics challenges.

ML, a family of techniques for detecting and utilizing patterns in data, proves beneficial in predicting future data or making decisions from uncertain measurements. It can be categorized into supervised and unsupervised learning, each serving distinct purposes. The historical focus in acoustics on high-level physical models is juxtaposed with the success of data-driven approaches facilitated by ML, indicating a shift towards hybrid models combining advanced acoustic models with ML.

In this dynamic landscape, ML in acoustics has witnessed remarkable progress, offering superior performance compared to traditional signal processing methods. However, challenges, such as the need for large datasets and the interpretability of ML models, persist. Despite these challenges, ML holds considerable potential in advancing acoustics research, as demonstrated.

The historical context of stethoscopes in medical practice, particularly in listening to the heart, lungs, and bowel sounds. Scientific analysis of bowel sounds dates back to the early 1900s, with observations and recordings dating even further. The sounds produced by the gastrointestinal tract offer valuable insights into the anatomy and physiology of the human gut, potentially revealing activities of the microbiome.

Refer to caption

Figure 1: Time domain acoustic signal recorded from the gut

The study further discusses the intersection of big data analytics and artificial intelligence in diverse applications, including bowel sound analysis. Artificial intelligence models, driven by advancements in computer processing power, have found utility in areas such as disease diagnosis and civil engineering. The application of these technologies to identify and analyze bowel sounds represents a notable advancement, offering a deeper understanding of gut functions and potential applications in healthcare. Including references e.g. [1],[2],[3],[4],[5],[6]

The discussion concludes by highlighting improvements in acoustic signal processing methods, particularly in noise reduction and signal enhancement. Pioneering work in the 1970s utilized computers to analyze bowel sounds, marking the beginning of a journey that incorporated advanced signal processing techniques like Fourier transformation and short time Fourier transformation. These advancements culminated in the automatic detection of bowel sounds, showcasing the evolution of acoustic signal processing techniques in bowel sound applications.

II Literature Survey

II-A Acoustic Signal Processing and Machine Learning Fundamentals

Machine Learning (ML) operates on a data-driven paradigm, capable of uncovering intricate relationships between features that conventional methods may overlook. While classic signal processing techniques rely on provable performance guarantees and simplifying assumptions, ML, particularly Deep Learning (DL), has demonstrated enhanced performance in various tasks. However, the increased flexibility of ML models introduces complexities, impacting both performance guarantees and model interpretability. ML models often necessitate substantial training data, though the requirement for ’vast’ quantities is not mandatory to leverage ML techniques. Despite challenges, ML’s benefits may outweigh the issues, especially when high performance is essential for a specific task. Including references e.g. [7],[8],[9],[10],[11]

Inputs and Outputs: In ML, the goal is often to train a model to produce a desired output (y) given inputs (x). The supervised learning framework, represented by the equation

y=f(x)+ε𝑦𝑓𝑥𝜀y=f(x)+\varepsilonitalic_y = italic_f ( italic_x ) + italic_ε

involves predicting outputs based on labeled input and output pairs. Here, x represents N features, y represents P desired outputs, f(x) is the predicted output, and ε𝜀\varepsilonitalic_ε is the error. Training an ML model requires numerous examples, with X representing the inputs and Y representing the corresponding outputs. Supervised learning focuses on predicting specific outputs, while unsupervised learning aims to discover patterns in data without explicit output specifications. Unsupervised learning often involves learning a model that approximates the features themselves. Including references e.g. [12],[13],[14],[15],[16]

Refer to caption

Figure 2: Model generalization

Refer to caption

Figure 3: Model generalization

II-B Signal Identification and Enhancement

Sounds result from mechanical deformation, generating energy waves detected by the ear or transducer. Acoustic signal processing and ML techniques contribute to understanding these phenomena. Including references e.g. [17],[18],[19],[20],[21]

II-B1 Time Domain Signal

The raw data, a time domain signal, is crucial for acoustic analysis. Features like SNR, duration, and event count are extracted, aiding in signal quality assessment. Filtering methods, including adaptive filtering, enhance signals by removing unwanted components.

II-B2 Frequency Domain Signal

Transforming signals into the frequency domain through Fourier analysis reveals information unobservable in the time domain. The FFT technique provides features like centroid frequency and spectral bandwidth, but may lose some time domain information.

II-B3 Time-Frequency Domain Signal

Simultaneous time and frequency information is obtained using Short-Time Fourier Transform (STFT) or Wavelet Transform (WT). Spectrograms from STFT enable speech recognition and noise suppression. WT, known for noise suppression, offers varied time and frequency domain information.

III Advanced Signal Processing

III-A Supervised Learning and Linear Regression in the Context of Acoustic Signal Processing

Supervised learning, a fundamental aspect of machine learning (ML), aims to establish a map** from a set of inputs to desired outputs through labeled input-output pairs. In this discussion, we focus on real-valued features and labels, where the N features in x𝑥xitalic_x can be real, complex, or categorical. The corresponding supervised learning tasks are divided into two subcategories: regression and classification. Regression addresses scenarios where y𝑦yitalic_y is real or complex valued, while classification pertains to cases with categorical y𝑦yitalic_y.

The central focus in ML methods lies in finding the function f𝑓fitalic_f, particularly using probability tools when practical. The supervised ML task can be articulated as maximizing the conditional distribution p(y|x)𝑝conditional𝑦𝑥p(y|x)italic_p ( italic_y | italic_x ), with the Maximum A Posteriori (MAP) estimator providing a point estimate for y𝑦yitalic_y, denoted as yb=f(x)superscript𝑦𝑏𝑓𝑥y^{b}=f(x)italic_y start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT = italic_f ( italic_x ).

Linear regression serves as an illustrative example of supervised ML. In the context of Direction of Arrival (DOA) estimation in beamforming for seismic and acoustic applications, we represent the relationship between the observed Fourier-transformed measurements x𝑥xitalic_x and the DOA azimuth angle y𝑦yitalic_y using a linear measurement model. The optimization problem seeks values of weights w𝑤witalic_w that minimize the difference between the observed and predicted measurements, effectively solving the linear regression problem.

The ensuing Bayesian treatment involves formulating the posterior of the model using Bayes’ rule, leading to a MAP estimate for the weights. Depending on the choice of the probability density function for the weights, solutions may vary. A popular choice, the Gaussian distribution, results in the classic L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularized least squares estimate, incorporating a regularization parameter for stability.

This detailed exposition highlights the foundational principles of supervised learning and its application in linear regression within the specific domain of acoustics, illustrating the seamless integration of theoretical ML concepts with practical signal processing challenges. Including references e.g. [22],[23],[24],[25]

III-A1 Advanced Signal Processing in Bowel Sound Analysis

Acoustic signal processing in the context of bowel sound analysis involves a multi-step sequence encompassing data acquisition, preprocessing, and subsequent analysis. The reviewed literature reveals a diverse array of approaches and methodologies, with certain commonalities in the overall processing flow.

III-A2 Data Acquisition

To record abdominal sounds, specialized transducers, such as electret condenser microphones or piezoelectric transducers, are designed to convert acoustic energy into electrical signals. Electronic stethoscopes, including designs like the JABES digital stethoscope and 3M Littmann 3200, demonstrate the versatility of these transducers. Additionally, innovative approaches, such as 3D-printed stethoscope heads with built-in electronics, reflect evolving design paradigms.

III-A3 Preprocessing and Analysis

The preprocessing stage involves denoising, filtering, and segmentation of acoustic signals, often employing techniques like adaptive filtering and envelo**. The choice of window functions, such as rectangular, Hamming, and Hann, plays a crucial role in the slicing of acoustic recordings into small samples.

III-A4 Bowel Sound Analysis

From the early 2000s, wavelet transforms (WTs) have enabled advanced feature extraction, coinciding with the integration of machine learning methods. Researchers, exemplified by groups led by Hadjileontiadis et al., have made substantial progress in noise reduction and signal enhancement for bowel sounds. Various machine learning methods, including decision trees, dimension reduction, and artificial neural networks, have been applied to characterize bowel sounds. Including references e.g. [26],[27],[28]

In acoustics, the Fourier Transform is often used to analyze the frequency components of a signal. The Fourier Transform of a function f(t)𝑓𝑡f(t)italic_f ( italic_t ) is defined as:

F(ω)=f(t)eiωt𝑑t𝐹𝜔superscriptsubscript𝑓𝑡superscript𝑒𝑖𝜔𝑡differential-d𝑡F(\omega)=\int_{-\infty}^{\infty}f(t)e^{-i\omega t}\,dtitalic_F ( italic_ω ) = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_f ( italic_t ) italic_e start_POSTSUPERSCRIPT - italic_i italic_ω italic_t end_POSTSUPERSCRIPT italic_d italic_t (1)

where F(ω)𝐹𝜔F(\omega)italic_F ( italic_ω ) is the Fourier Transform of f(t)𝑓𝑡f(t)italic_f ( italic_t ), and ω𝜔\omegaitalic_ω is the angular frequency.

Let’s strudy a sound signal f(t)𝑓𝑡f(t)italic_f ( italic_t ) given by:

f(t)=Asin(2πf0t)𝑓𝑡𝐴2𝜋subscript𝑓0𝑡f(t)=A\sin(2\pi f_{0}t)italic_f ( italic_t ) = italic_A roman_sin ( 2 italic_π italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_t ) (2)

where A𝐴Aitalic_A is the amplitude and f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the frequency of the sound.

The Fourier Transform of f(t)𝑓𝑡f(t)italic_f ( italic_t ) is then calculated as:

F(ω)=Asin(2πf0t)eiωt𝑑t𝐹𝜔superscriptsubscript𝐴2𝜋subscript𝑓0𝑡superscript𝑒𝑖𝜔𝑡differential-d𝑡F(\omega)=\int_{-\infty}^{\infty}A\sin(2\pi f_{0}t)e^{-i\omega t}\,dtitalic_F ( italic_ω ) = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_A roman_sin ( 2 italic_π italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_t ) italic_e start_POSTSUPERSCRIPT - italic_i italic_ω italic_t end_POSTSUPERSCRIPT italic_d italic_t (3)

This integral can be solved to find the expression for F(ω)𝐹𝜔F(\omega)italic_F ( italic_ω ).

The literature review underscores the dynamic landscape of acoustic signal processing in bowel sound analysis, with researchers adopting diverse approaches across the processing stages. From innovative data acquisition methods to sophisticated preprocessing techniques and the application of machine learning, the field demonstrates a blend of traditional signal processing principles and contemporary methodologies. The convergence of theoretical insights and practical implementations serves as a foundation for continued advancements in acoustic signal processing for bowel sound analysis. Including references e.g. [29],[30],[31],[32]

III-B Parallelization of All-Pairs Algorithm (OpenMP)

The provided algorithm outlines an approach to acoustic signal processing with parallelization using OpenMP.

III-B1 Main Function: acousticSignalProcessing()

  • This function serves as the entry point for the acoustic signal processing algorithm.

  • It is marked for parallelization using the #pragma omp parallel for directive, which instructs the compiler to parallelize the loop that iterates over the model collection. For each model in the collection, the function calls processModel(i, signal).

III-B2 Processing Each Model: processModel(i: model, signal)

  • This function is also marked for parallelization using the #pragma omp parallel for reduction (+ : result[i].amplitude) directive.

  • It contains a nested loop that iterates over the signal collection for each model. For each pair of models (i, j), where j is not equal to i, it calculates the similarity between the models using the calculateSimilarity(i, j) function.

  • The amplitude of the result for the current model (result[i].amplitude) is adjusted based on the calculated similarity using the adjustAmplitude(i, j, similarity) function.

III-B3 Calculating Similarity: calculateSimilarity(i, j)

  • The specific details of how the similarity is calculated are not provided in the algorithm and should be implemented according to the requirements of the acoustic signal processing application.

  • This function is a placeholder for calculating the similarity between two models, i and j.

III-B4 Adjusting Amplitude: adjustAmplitude(i, j, similarity)

  • This function is a placeholder for adjusting the amplitude of a model based on the calculated similarity.

  • Again, the exact method of adjusting the amplitude is not specified and needs to be implemented based on the application’s requirements.

1 Function acousticSignalProcessing() is
2       #pragma omp parallel for
3       foreach i: model do
4             processModel(i, signal)
5      
6 Function processModel(i: model, signal) is
7       #pragma omp parallel for reduction (+ : result[i].amplitude)
8       foreach j in signal do
9             if j \neq i then
10                   similarity = calculateSimilarity(i, j) result[i].amplitude += adjustAmplitude(i, j, similarity)
11            
12      
13Function calculateSimilarity(i, j) is
       // Calculations of the similarities between models i and j
14       return similarity
15Function adjustAmplitude(i, j, similarity) is
       // Adjustment of the amplitude based on the similarities
16       return adjustedAmplitude
Algorithm 1 Acoustic Signal Processing Algorithm (OpenMP)

III-C Parallelization of All-Pairs Algorithm (CUDA)

Sequential Barnes-Hut Algorithm with Acoustic Signal Processing

III-C1 Main Function: acousticBarnesHut()

  • This function represents the entry point for the integrated algorithm, combining the Sequential Barnes-Hut structure with acoustic signal processing.

  • It orchestrates the sequential execution of three main steps: building the tree (build_tree()), computing mass distribution (compute_mass_distribution()), and calculating forces (compute_force()).

III-C2 Building the Tree: build_tree()

  • The function initializes the tree structure, preparing it for the insertion of acoustic models.

  • It iterates over each acoustic model in the dataset and inserts it into the root node using the insert_to_node() function.

III-C3 Inserting Models into Nodes: insert_to_node(new_model)

  • This function is responsible for placing a new acoustic model into the appropriate quadrant of the Barnes-Hut tree.

  • It checks the number of existing models in a node. If there is more than one model, it recursively traverses the tree to find the appropriate quadrant for the new model. If there’s only one model, it divides the node into quadrants, placing the existing and new models accordingly.

  • If no models exist in the node, it directly assigns the new model as the existing model.

III-C4 Computing Mass Distribution: compute_mass_distribution()

  • This function calculates the mass distribution within each quadrant of the Barnes-Hut tree.

  • If there is only one model in a quadrant, the center of mass and mass are directly assigned from that model. Otherwise, it recursively calculates the mass distribution for child quadrants, aggregating the mass and weighted center of mass.

III-C5 Calculating Forces: calculate_force(target)

  • This function computes the acoustic forces acting on a target model.

  • If there’s only one model in the quadrant, the force is calculated using the acoustic_force() function between the target and the model. If the quadrant size is below a certain threshold (lD ¡ theta), the force is computed using the acoustic force model.

  • If the quadrant is larger, the algorithm recursively calculates forces for child nodes and aggregates them.

III-C6 Computing Forces for all Models: compute_force()

  • This function iterates over all acoustic models in the dataset and computes the forces acting on each model using the root_node.calculate_force(model) function.

  • If there’s only one model in the quadrant, the force is calculated using the acoustic_force() function between the target and the model. If the quadrant size is below a certain threshold (lD ¡ theta), the force is computed using the acoustic force model.

  • If the quadrant is larger, the algorithm recursively calculates forces for child nodes and aggregates them.

1 Function acousticBarnesHut() is
2       build_tree() compute_mass_distribution() compute_force()
3Function build_tree() is
4       Reset Tree foreach i: model do
5             root_nodenormal-→\rightarrowinsert_to_node(i)
6      
7Function insert_to_node(new_model) is
8       if num_models >>> 1 then
9             quad = get_quadrant(new_model) if subnode(quad) does not exist then
10                   create subnode(quad)
11            subnode(quad)\rightarrowinsert_to_node(new_model)
12      else if num_models ==absent=== = 1 then
13             quad = get_quadrant(new_model) if subnode(quad) does not exist then
14                   create subnode(quad)
15            subnode(quad)\rightarrowinsert_to_node(existing_model) quad = get_quadrant(new_model) if subnode(quad) \neq NULL then
16                   create subnode(quad)
17            subnode(quad)\rightarrowinsert_to_node(new_model)
18      else
19             existing_model \leftarrow new_model
20      num_models++
Algorithm 2 Algorithm Part 1
1 Function compute_mass_distribution() is
2       if num_models ==absent=== = 1 then
3             center_of_mass = model.position mass = model.mass
4      else
5             forall child quadrants with models do
6                   quadrant.compute_mass_distribution mass += quadrant.mass center_of_mass += quadrant.mass ×\times× quadrant.center_of_mass
7            center_of_mass /= mass
8      
Algorithm 3 Algorithm Part 2
1 Function calculate_force(target) is
2       Initialize force \leftarrow 0 if num_models ==absent=== = 1 then
3             force = acoustic_force(target, model)
4      else
5             if l/D <<< θ𝜃\thetaitalic_θ then
6                   force = acoustic_force(target, model)
7            else
8                   forall node : child nodes do
9                         force += node.calculate_force(node)
10                  
11            
12      
13Function compute_force() is
14       forall models do
15             force = root_node.calculate_force(model)
16      
Algorithm 4 Algorithm Part 3

III-D Sequential Barnes-Hut Algorithm

It represents the integrated algorithm with the Sequential Barnes-Hut structure and Acoustic Signal Processing. The algorithm includes functions for building the tree, inserting models into nodes, computing mass distribution, calculating forces, and overall coordination of the acoustic signal processing with the Barnes-Hut algorithm.

Refer to caption

Figure 4: Barnes-Hut tree structure

Refer to caption

Figure 5: Barnes-Hut domain decomposition

The integrated algorithm merges the Sequential Barnes-Hut structure, designed for efficient gravitational force calculations, with acoustic signal processing. The Barnes-Hut tree structure optimizes the computation of forces between acoustic models, enhancing the algorithm’s scalability and efficiency in handling large datasets. The acoustic signal processing steps involve building the tree, distributing mass, and calculating forces, offering a comprehensive solution for analyzing and simulating acoustic interactions within a given system.

IV CONCLUSIONS

In this comprehensive review, we have presented an overview of Machine Learning (ML) theory, with a particular focus on deep learning (DL), and explored its diverse applications across various acoustics research domains. While our coverage is not exhaustive, it is evident that ML has been a catalyst for numerous recent advancements in acoustics. This article aims to inspire future ML research in acoustics, emphasizing the pivotal role of large, publicly available datasets in fostering innovation across the acoustics field. The transformative potential of ML in acoustics is substantial, with its benefits amplified through open data practices. Including references e.g. [33],[34],[35],[36],[37]

Despite the acknowledged limitations of ML-based methods, their performance surpasses that of conventional processing methods in many scenarios. However, it is crucial to recognize that ML models, being data-driven, demand substantial representative training data for optimal performance. This is viewed as a trade-off for accurately modeling complex phenomena, given the often high capacity of ML models. In contrast, standard processing methods, with lower capacity, rely on training-free statistical and mathematical models. Including references e.g. [38],[39],[40],[41]

This review suggests a paradigm shift in acoustic processing from hand-engineered, intuition-driven models to a data-driven ML approach. While harnessing the full potential of ML, it is essential to build upon indispensable physical intuition and theoretical developments within established sub-fields like array processing. The development of ML theory in acoustics should be undertaken while preserving the foundational physical principles that describe our environments. By blending ML advancements with established principles, transformative progress can be achieved across various acoustics fields. Including references e.g. [42],[43]

Upon on bowel sound analysis, several conclusions emerge. The choice of sensors for data acquisition, including electret condenser microphones and piezoelectric transducers, depends on research constraints. Advanced signal processing techniques, such as wavelet transforms (WTs) since the early 2000s, have enabled complex feature extraction. Machine learning methods have found application in bowel sound analysis, with varying approaches such as decision trees, dimension reduction, and clustering algorithms. Including references e.g. [44],[45],[46],[47], [48],[49],[50]

CONFLICT of INTEREST

The authors declare that they have no conflict of interest and funding.

References

  • [1] S. Hamsa, I. Shahin, Y. Iraqi, E. Damiani, A. B. Nassif, and N. Werghi, “Speaker identification from emotional and noisy speech using learned voice segregation and speech vgg,” Expert Systems with Applications, vol. 224, 2023.
  • [2] J. H. Hansen, “Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition,” Speech Communication, vol. 20, 1996.
  • [3] J. H. Hansen and T. Hasan, “Speaker recognition by machines and humans: A tutorial review,” 2015.
  • [4] G. Hickok and D. Poeppel, “The cortical organization of speech processing,” 2007.
  • [5] D. Hollfelder, L. Prein, T. Jürgens, A. Leichtle, and K. L. Bruchhage, “Influence of directional microphones on listening effort in middle ear implant users,” HNO, vol. 71, 2023.
  • [6] Y. Huang, Y. Ma, J. Xiao, W. Liu, and G. Zhang, “Identification of depression state based on multi-scale acoustic features in interrogation environment,” IET Signal Processing, vol. 17, 2023.
  • [7] K. L. Johnson, T. G. Nicol, and N. Kraus, “Brain stem response to speech: A biological marker of auditory processing,” 2005.
  • [8] Y. H. Jung, S. K. Hong, H. S. Wang, J. H. Han, T. X. Pham, H. Park, J. Kim, S. Kang, C. D. Yoo, and K. J. Lee, “Flexible piezoelectric acoustic sensors and machine learning for speech processing,” 2020.
  • [9] K. Khoria, A. T. Patil, and H. A. Patil, “On significance of constant-q transform for pop noise detection,” Computer Speech and Language, vol. 77, 2023.
  • [10] F. Kong, H. Zhou, Y. Mo, M. Shi, Q. Meng, and N. Zheng, “Comparable encoding, comparable perceptual pattern: Acoustic and electric hearing,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, 2023.
  • [11] J. C. Krause and L. D. Braida, “Acoustic properties of naturally produced clear speech at normal speaking rates,” The Journal of the Acoustical Society of America, vol. 115, 2004.
  • [12] B. S. Krishna and M. N. Semple, “Auditory temporal processing: Responses to sinusoidally amplitude- modulated tones in the inferior colliculus,” Journal of Neurophysiology, vol. 84, 2000.
  • [13] G. Langner, “Periodicity coding in the auditory system,” 1992.
  • [14] C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Transactions on Speech and Audio Processing, vol. 13, 2005.
  • [15] C. Lenk, P. Hövel, K. Ved, S. Durstewitz, T. Meurer, T. Fritsch, A. Männchen, J. Küller, D. Beer, T. Ivanov, and M. Ziegler, “Neuromorphic acoustic sensing using an adaptive microelectromechanical cochlea with integrated feedback,” Nature Electronics, vol. 6, 2023.
  • [16] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello, and I. M. Moroz, “Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,” BioMedical Engineering Online, vol. 6, 2007.
  • [17] W. Liu and D. S. Vicario, “Dynamic encoding of phonetic categories in zebra finch auditory forebrain,” Scientific Reports, vol. 13, 2023.
  • [18] S. Luthra, “Why are listeners hindered by talker variability?” 2023.
  • [19] J. S. Magnuson and H. C. Nusbaum, “Acoustic differences, listener expectations, and the perceptual accommodation of talker variability,” Journal of Experimental Psychology: Human Perception and Performance, vol. 33, 2007.
  • [20] S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals,” IEEE Transactions on Audio, Speech and Language Processing, vol. 17, 2009.
  • [21] B. A. Martin and A. Boothroyd, “Cortical, auditory, event-related potentials in response to periodic and aperiodic stimuli with the same spectral envelope,” Ear and Hearing, vol. 20, 1999.
  • [22] N. D. Merchant, K. M. Fristrup, M. P. Johnson, P. L. Tyack, M. J. Witt, P. Blondel, and S. E. Parks, “Measuring acoustic habitats,” Methods in Ecology and Evolution, vol. 6, 2015.
  • [23] N. Mesgarani, C. Cheung, K. Johnson, and E. F. Chang, “Phonetic feature encoding in human superior temporal gyrus,” Science, vol. 343, 2014.
  • [24] L. Meyer, “The neural oscillations of speech processing and language comprehension: state of the art and emerging mechanisms,” 2018.
  • [25] G. Minelli, G. E. Puglisi, A. Astolfi, C. Hauth, and A. Warzybok, “Objective assessment of binaural benefit from acoustical treatment in real primary school classrooms,” International Journal of Environmental Research and Public Health, vol. 20, 2023.
  • [26] D. Nagarajan, S. Broumi, and F. Smarandache, “Neutrosophic speech recognition algorithm for speech under stress by machine learning,” Neutrosophic Sets and Systems, vol. 55, 2023.
  • [27] J. E. Peelle and A. Wingfield, “The neural consequences of age-related hearing loss,” 2016.
  • [28] J. E. Peelle, “Listening effort: How the cognitive consequences of acoustic challenge are reflected in brain and behavior,” Ear and Hearing, vol. 39, 2018.
  • [29] D. Poeppel, “Pure word deafness and the bilateral processing of the speech code,” Cognitive Science, vol. 25, 2001.
  • [30] V. Poluboina, A. Pulikala, and A. N. P. Muthu, “An improved noise reduction technique for enhancing the intelligibility of sinewave vocoded speech: Implication in cochlear implants,” IEEE Access, vol. 11, 2023.
  • [31] R. B. Randall, “A history of cepstrum analysis and its application to mechanical problems,” Mechanical Systems and Signal Processing, vol. 97, 2017.
  • [32] M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, “Light gated recurrent units for speech recognition,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, 2018.
  • [33] T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani, M. Bacchiani, I. Shafran, A. Senior, K. Chin, A. Misra, and C. Kim, “Multichannel signal processing with deep neural networks for automatic speech recognition,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 25, 2017.
  • [34] M. Schonwiesner, R. Rübsamen, and D. Y. V. Cramon, “Hemispheric asymmetry for spectral and temporal processing in the human antero-lateral auditory belt cortex,” European Journal of Neuroscience, vol. 22, 2005.
  • [35] M. Souden, J. Benesty, and S. Affes, “On optimal frequency-domain multichannel linear filtering for noise reduction,” IEEE Transactions on Audio, Speech and Language Processing, vol. 18, 2010.
  • [36] E. P. Stephen, Y. Li, S. Metzger, Y. Oganian, and E. F. Chang, “Latent neural dynamics encode temporal context in speech,” 2023.
  • [37] K. N. Stevens, “Toward a model for lexical access based on acoustic landmarks and distinctive features,” The Journal of the Acoustical Society of America, vol. 111, 2002.
  • [38] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia, vol. 17, 2015.
  • [39] N. Tandon and A. Choudhury, “Review of vibration and acoustic measurement methods for the detection of defects in rolling element bearings,” Tribology International, vol. 32, 1999.
  • [40] S. Telkemeyer, S. Rossi, S. P. Koch, T. Nierhaus, J. Steinbrink, D. Poeppel, H. Obrig, and I. Wartenburger, “Sensitivity of newborn auditory cortex to the temporal structure of sounds,” Journal of Neuroscience, vol. 29, 2009.
  • [41] F. Tezcan, H. Weissbart, and A. E. Martin, “A tradeoff between acoustic and linguistic feature encoding in spoken language comprehension,” eLife, vol. 12, 2023.
  • [42] C. Ufer and H. Blank, “Multivariate analysis of brain activity patterns as a tool to understand predictive processes in speech perception,” Language, Cognition and Neuroscience, 2023.
  • [43] F. Viola and W. F. Walker, “A spline-based algorithm for continuous time-delay estimation using sampled data,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 52, 2005.
  • [44] M. Voola, A. T. Nguyen, A. Wedekind, W. Marinovic, G. Rajan, and D. Tavora-Vieira, “A study of event-related potentials during monaural and bilateral hearing in single-sided deaf cochlear implant users,” Ear and Hearing, vol. 44, 2023.
  • [45] H. Wakita, “Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms,” IEEE Transactions on Audio and Electroacoustics, vol. 21, 1973.
  • [46] M. Wu, D. L. Wang, and G. J. Brown, “A multipitch tracking algorithm for noisy speech,” IEEE Transactions on Speech and Audio Processing, vol. 11, 2003.
  • [47] L. Xu, Y. Tsai, and B. E. Pfingst, “Features of stimulation affecting tonal-speech perception: Implications for cochlear prostheses,” The Journal of the Acoustical Society of America, vol. 112, 2002.
  • [48] R. Xu, J. Sun, Y. Wang, S. Zhang, W. Zhong, and Z. Wang, “Speech enhancement based on array-processing-assisted distributed fiber acoustic sensing,” IEEE Sensors Journal, vol. 23, 2023.
  • [49] X. Yang, K. Wang, and S. A. Shamma, “Auditory representations of acoustic signals,” IEEE Transactions on Information Theory, vol. 38, 1992.
  • [50] K. Zmolikova, M. Delcroix, T. Ochiai, K. Kinoshita, J. Cernocky, and D. Yu, “Neural target speech extraction: An overview,” IEEE Signal Processing Magazine, vol. 40, 2023.