Skip to main content

Showing 1–25 of 25 results for author: Niesler, T

Searching in archive cs. Search in all archives.
.
  1. Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech

    Authors: Geoffrey Frost, Emily Morris, Joshua Jansen van Vüren, Thomas Niesler

    Abstract: Annotating a multilingual code-switched corpus is a painstaking process requiring specialist linguistic expertise. This is partly due to the large number of language combinations that may appear within and across utterances, which might require several annotators with different linguistic expertise to consider an utterance sequentially. This is time-consuming and costly. It would be useful if the… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Presented at SACAIR 2022

  2. arXiv:2209.00934  [pdf, other

    eess.AS cs.LG cs.SD

    TB or not TB? Acoustic cough analysis for tuberculosis classification

    Authors: Geoffrey Frost, Grant Theron, Thomas Niesler

    Abstract: In this work, we explore recurrent neural network architectures for tuberculosis (TB) cough classification. In contrast to previous unsuccessful attempts to implement deep architectures in this domain, we show that a basic bidirectional long short-term memory network (BiLSTM) can achieve improved performance. In addition, we show that by performing greedy feature selection in conjunction with a ne… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: Accepted for publication at Interspeech 2022

  3. arXiv:2205.05480  [pdf, other

    cs.LG cs.SD eess.AS q-bio.QM

    Automatic Tuberculosis and COVID-19 cough classification using deep learning

    Authors: Madhurananda Pahar, Marisa Klopper, Byron Reeve, Rob Warren, Grant Theron, Andreas Diacon, Thomas Niesler

    Abstract: We present a deep learning based automatic cough classifier which can discriminate tuberculosis (TB) coughs from COVID-19 coughs and healthy coughs. Both TB and COVID-19 are respiratory diseases, contagious, have cough as a predominant symptom and claim thousands of lives each year. The cough audio recordings were collected at both indoor and outdoor settings and also uploaded using smartphones fr… ▽ More

    Submitted 10 September, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: This paper has been published in 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET)

    Journal ref: 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), 2022, pp. 1-9

  4. Accelerometer-based Bed Occupancy Detection for Automatic, Non-invasive Long-term Cough Monitoring

    Authors: Madhurananda Pahar, Igor Miranda, Andreas Diacon, Thomas Niesler

    Abstract: We present a new machine learning based bed-occupancy detection system that uses the accelerometer signal captured by a bed-attached consumer smartphone. Automatic bed-occupancy detection is necessary for automatic long-term cough monitoring, since the time which the monitored patient occupies the bed is required to accurately calculate a cough rate. Accelerometer measurements are more cost effect… ▽ More

    Submitted 13 March, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Journal ref: IEEE Access, vol. 11, pp. 30739-30752, 2023

  5. arXiv:2202.01639  [pdf, ps, other

    cs.HC

    Mathematical Content Browsing for Print-Disabled Readers Based on Virtual-World Exploration and Audio-Visual Sensory substitution

    Authors: Rynhardt Kruger, Febe de Wet, Thomas Niesler

    Abstract: Documents containing mathematical content remain largely inaccessible to blind and visually impaired readers because they are predominantly published as untagged PDF which does not include the semantic data necessary for effective accessibility. We present a browsing approach for print-disabled readers specifically aimed at such mathematical content. This approach draws on the navigational mechani… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  6. arXiv:2110.03771  [pdf, other

    cs.SD cs.LG eess.AS

    Wake-Cough: cough spotting and cougher identification for personalised long-term cough monitoring

    Authors: Madhurananda Pahar, Marisa Klopper, Byron Reeve, Rob Warren, Grant Theron, Andreas Diacon, Thomas Niesler

    Abstract: We present `wake-cough', an application of wake-word spotting to coughs using a Resnet50 and the identification of coughers using i-vectors, for the purpose of a long-term, personalised cough monitoring system. Coughs, recorded in a quiet (73$\pm$5 dB) and noisy (34$\pm$17 dB) environment, were used to extract i-vectors, x-vectors and d-vectors, used as features to the classifiers. The system achi… ▽ More

    Submitted 10 September, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  7. arXiv:2109.00103  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Automatic non-invasive Cough Detection based on Accelerometer and Audio Signals

    Authors: Madhurananda Pahar, Igor Miranda, Andreas Diacon, Thomas Niesler

    Abstract: We present an automatic non-invasive way of detecting cough events based on both accelerometer and audio signals. The acceleration signals are captured by a smartphone firmly attached to the patient's bed, using its integrated accelerometer. The audio signals are captured simultaneously by the same smartphone using an external microphone. We have compiled a manually-annotated dataset contain… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

    Comments: arXiv admin note: text overlap with arXiv:2102.04997

    Journal ref: Journal of Signal Processing Systems, 2022

  8. COVID-19 Detection in Cough, Breath and Speech using Deep Transfer Learning and Bottleneck Features

    Authors: Madhurananda Pahar, Marisa Klopper, Robin Warren, Thomas Niesler

    Abstract: We present an experimental investigation into the effectiveness of transfer learning and bottleneck feature extraction in detecting COVID-19 from audio recordings of cough, breath and speech. This type of screening is non-contact, does not require specialist medical expertise or laboratory facilities and can be deployed on inexpensive consumer hardware. We use datasets that contain recordings… ▽ More

    Submitted 17 August, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Journal ref: Computers in Biology and Medicine, 2022

  9. arXiv:2103.13300  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Cough Classification for Tuberculosis Screening in a Real-World Environment

    Authors: Madhurananda Pahar, Marisa Klopper, Byron Reeve, Grant Theron, Rob Warren, Thomas Niesler

    Abstract: Objective: The automatic discrimination between the coughing sounds produced by patients with tuberculosis (TB) and those produced by patients with other lung ailments. Approach: We present experiments based on a dataset of 1358 forced cough recordings obtained in a develo**-world clinic from 16 patients with confirmed active pulmonary TB and 35 patients suffering from respiratory conditions s… ▽ More

    Submitted 17 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: This paper has been accepted in Physiological Measurement (2021)

    Journal ref: Physiological Measurement, 2021

  10. Deep Neural Network based Cough Detection using Bed-mounted Accelerometer Measurements

    Authors: Madhurananda Pahar, Igor Miranda, Andreas Diacon, Thomas Niesler

    Abstract: We have performed cough detection based on measurements from an accelerometer attached to the patient's bed. This form of monitoring is less intrusive than body-attached accelerometer sensors, and sidesteps privacy concerns encountered when using audio for cough detection. For our experiments, we have compiled a manually-annotated dataset containing the acceleration signals of approximately 6000 c… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: It has been accepted in ICASSP, 2021. Copyright information is shown at the very first page

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

  11. COVID-19 Cough Classification using Machine Learning and Global Smartphone Recordings

    Authors: Madhurananda Pahar, Marisa Klopper, Robin Warren, Thomas Niesler

    Abstract: We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs from both COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a cough suggestive of COVID-19.… ▽ More

    Submitted 14 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: This paper has been accepted in "Computers in Medicine and Biology" and currently under production

    Journal ref: Computers in Biology and Medicine, 2021

  12. arXiv:2011.03118  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

    Authors: Trideba Padhi, Astik Biswas, Febe De Wet, Ewald van der Westhuizen, Thomas Niesler

    Abstract: In this work, we explore the benefits of using multilingual bottleneck features (mBNF) in acoustic modelling for the automatic speech recognition of code-switched (CS) speech in African languages. The unavailability of annotated corpora in the languages of interest has always been a primary challenge when develo** speech recognition systems for this severely under-resourced type of speech. Hence… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: In Proceedings of The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities

    Journal ref: http://festvox.org/cedar/WSTCSMC2020.pdf

  13. arXiv:2004.06480  [pdf, other

    eess.AS cs.LG cs.SD

    Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

    Authors: N. Wilkinson, A. Biswas, E. Yılmaz, F. de Wet, E. van der Westhuizen, T. R. Niesler

    Abstract: This paper considers the impact of automatic segmentation on the fully-automatic, semi-supervised training of automatic speech recognition (ASR) systems for five-lingual code-switched (CS) speech. Four automatic segmentation techniques were evaluated in terms of the recognition performance of an ASR system trained on the resulting segments in a semi-supervised manner. The system's output was compa… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135

  14. arXiv:2004.04054  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

    Authors: A. Biswas, F. de Wet, E. van der Westhuizen, T. R. Niesler

    Abstract: We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions wer… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: 4th Code-Switch workshop, France

  15. arXiv:2003.03135  [pdf, other

    eess.AS cs.LG cs.SD

    Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages

    Authors: Astik Biswas, Emre Yılmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler

    Abstract: This paper reports on the semi-supervised development of acoustic and language models for under-resourced, code-switched speech in five South African languages. Two approaches are considered. The first constructs four separate bilingual automatic speech recognisers (ASRs) corresponding to four different language pairs between which speakers switch frequently. The second uses a single, unified, fiv… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: Conference

    ACM Class: F.2.2; I.2.7

  16. arXiv:1907.03064  [pdf, other

    cs.CL cs.LG eess.AS

    Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

    Authors: Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler

    Abstract: We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

    Comments: 5 pages, 6 Tables, 3 figures, 22 references (Accepted at Interspeech 2019)

  17. arXiv:1906.08647  [pdf, other

    cs.CL cs.SD eess.AS

    Semi-supervised acoustic model training for five-lingual code-switched ASR

    Authors: Astik Biswas, Emre Yılmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler

    Abstract: This paper presents recent progress in the acoustic modelling of under-resourced code-switched (CS) speech in multiple South African languages. We consider two approaches. The first constructs separate bilingual acoustic models corresponding to language pairs (English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho). The second constructs a single unified five-lingual acoustic mode… ▽ More

    Submitted 15 October, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at Interspeech 2019

  18. arXiv:1811.08284  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders

    Authors: Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler

    Abstract: We compare features for dynamic time war** (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting. Such quickly-deployable systems aim to support United Nations (UN) humanitarian relief efforts in parts of Africa with severely under-resourced languages. Our objective is to identify acoustic features that provide acceptable KWS performance in such environments. As… ▽ More

    Submitted 12 July, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

    Comments: 5 pages, 2 figures, 2 tables, 38 references, Accepted at Interspeech 2019

  19. arXiv:1811.06756  [pdf, other

    cs.SD eess.AS

    Direction of Arrival Estimation of Wide-band Signals with Planar Microphone Arrays

    Authors: Rudolf Byker, Thomas Niesler

    Abstract: An approach to the estimation of the Direction of Arrival (DOA) of wide-band signals with a planar microphone array is presented. Our algorithm estimates an unambiguous DOA using a single planar array in which the microphones are placed fairly close together and the sound source is expected to be in the far field. The algorithm uses the ambiguous DOA estimates obtained from microphone pairs in the… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 10 pages

  20. arXiv:1810.12744  [pdf, other

    cs.LG stat.ML

    Cluster Size Management in Multi-Stage Agglomerative Hierarchical Clustering of Acoustic Speech Segments

    Authors: Lerato Lerato, Thomas Niesler

    Abstract: Agglomerative hierarchical clustering (AHC) requires only the similarity between objects to be known. This is attractive when clustering signals of varying length, such as speech, which are not readily represented in fixed-dimensional vector space. However, AHC is characterised by $O(N^2)$ space and time complexity, making it infeasible for partitioning large datasets. This has recently been addre… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: 12 pages

  21. arXiv:1810.12722  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Feature Trajectory Dynamic Time War** for Clustering of Speech Segments

    Authors: Lerato Lerato, Thomas Niesler

    Abstract: Dynamic time war** (DTW) can be used to compute the similarity between two sequences of generally differing length. We propose a modification to DTW that performs individual and independent pairwise alignment of feature trajectories. The modified technique, termed feature trajectory dynamic time war** (FTDTW), is applied as a similarity measure in the agglomerative hierarchical clustering of s… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: 10 pages

  22. arXiv:1807.10949  [pdf, ps, other

    cs.CL

    Building a Unified Code-Switching ASR System for South African Languages

    Authors: Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet, Thomas Niesler

    Abstract: We present our first efforts towards building a single multilingual automatic speech recognition (ASR) system that can process code-switching (CS) speech in five languages spoken within the same population. This contrasts with related prior work which focuses on the recognition of CS speech in bilingual scenarios. Recently, we have compiled a small five-language corpus of South African soap opera… ▽ More

    Submitted 28 July, 2018; originally announced July 2018.

    Comments: Acccepted for publication at Interspeech 2018

  23. arXiv:1807.08669  [pdf, other

    cs.CL stat.ML

    Automatic Speech Recognition for Humanitarian Applications in Somali

    Authors: Raghav Menon, Astik Biswas, Armin Saeb, John Quinn, Thomas Niesler

    Abstract: We present our first efforts in building an automatic speech recognition system for Somali, an under-resourced language, using 1.57 hrs of annotated speech for acoustic model training. The system is part of an ongoing effort by the United Nations (UN) to implement keyword spotting systems supporting humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

    Comments: 5 pages, 3 figures, 5 tables accepted at SLTU 2018

  24. arXiv:1807.08666  [pdf, other

    cs.CL stat.ML

    ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages

    Authors: Raghav Menon, Herman Kamper, Emre Yilmaz, John Quinn, Thomas Niesler

    Abstract: We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting. This forms part of a United Nations effort using keyword spotting to support humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We use 1920 isolated keywords (40 types, 34 minutes) as exemplars for dynamic time war** (DTW) template matching, which is perform… ▽ More

    Submitted 23 July, 2018; originally announced July 2018.

    Comments: 5 pages, 3 figures, 3 tables, 1 equation accepted at SLTU 2018

  25. arXiv:1806.09374  [pdf, other

    cs.CL

    Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring

    Authors: Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler

    Abstract: We use dynamic time war** (DTW) as supervision for training a convolutional neural network (CNN) based keyword spotting system using a small set of spoken isolated keywords. The aim is to allow rapid deployment of a keyword spotting system in a new language to support urgent United Nations (UN) relief programmes in parts of Africa where languages are extremely under-resourced and the development… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

    Comments: 5 pages, 4 figures, 3 tables, accepted at Interspeech 2018