Skip to main content

Showing 1–24 of 24 results for author: Barker, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16214  [pdf, other

    eess.IV

    Reducing the Sampling Burden of Fourier Sensing with a Non-rectangular Field-of-View

    Authors: Nicholas Dwork, Erin K. Englund, Alex J. Barker

    Abstract: With Fourier sensing, it is commonly the case that the field-of-view (FOV), the area of space to be imaged, is known prior to reconstruction. To date, reconstruction algorithms have focused on FOVs with simple geometries: a rectangle or a hexagon. This yields sampling patterns that are more burdensome than necessary. Due to the reduced area of imaging possible with an arbitrary (e.g., non-rectangu… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2405.05980  [pdf

    eess.IV cs.LG

    Overcoming challenges of translating deep-learning models for glioblastoma: the ZGBM consortium

    Authors: Haris Shuaib, Gareth J Barker, Peter Sasieni, Enrico De Vita, Alysha Chelliah, Roman Andrei, Keyoumars Ashkan, Erica Beaumont, Lucy Brazil, Chris Rowland-Hill, Yue Hui Lau, Aysha Luis, James Powell, Angela Swampillai, Sean Tenant, Stefanie C Thust, Stephen Wastling, Tom Young, Thomas C Booth

    Abstract: Objective: To report imaging protocol and scheduling variance in routine care of glioblastoma patients in order to demonstrate challenges of integrating deep-learning models in glioblastoma care pathways. Additionally, to understand the most common imaging studies and image contrasts to inform the development of potentially robust deep-learning models. Methods: MR imaging data were analysed from a… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2402.01413  [pdf, other

    cs.SD cs.LG eess.AS

    Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge

    Authors: Simon Leglaive, Matthieu Fraticelli, Hend ElGhazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker

    Abstract: Supervised models for speech enhancement are trained using artificially generated mixtures of clean speech and noise signals. However, the synthetic training conditions may not accurately reflect real-world conditions encountered during testing. This discrepancy can result in poor performance when the test domain significantly differs from the synthetic training domain. To tackle this issue, the U… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  4. arXiv:2401.13611  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

    Authors: Rhiannon Mogridge, George Close, Robert Sutherland, Thomas Hain, Jon Barker, Stefan Goetze, Anton Ragni

    Abstract: Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

  5. arXiv:2311.14490  [pdf, other

    cs.SD eess.AS

    Overview Of The 2023 Icassp Sp Clarity Challenge: Speech Enhancement For Hearing Aids

    Authors: Trevor J. Cox, Jon Barker, Will Bailey, Simone Graetzer, Michael A. Akeroyd, John F. Culling, Graham Naylor

    Abstract: This paper reports on the design and outcomes of the ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids. The scenario was a listener attending to a target speaker in a noisy, domestic environment. There were multiple interferers and head rotation by the listener. The challenge extended the second Clarity Enhancement Challenge (CEC2) by fixing the amplification stage of the hearing ai… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: ICASSP 2023

  6. arXiv:2310.19817  [pdf, other

    eess.AS cs.SD

    Intelligibility prediction with a pretrained noise-robust automatic speech recognition model

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  7. arXiv:2310.05799  [pdf, other

    eess.AS cs.LG eess.SP

    The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

    Authors: Gerardo Roa Dabike, Scott Bannister, Jennifer Firth, Simone Graetzer, Rebecca Vos, Michael A. Akeroyd, Jon Barker, Trevor J. Cox, Bruno Fazenda, Alinka Greasley, William Whitmer

    Abstract: The Cadenza project aims to improve the audio quality of music for those who have a hearing loss. This is being done through a series of signal processing challenges, to foster better and more inclusive technologies. In the first round, two common listening scenarios are considered: listening to music over headphones, and with a hearing aid in a car. The first scenario is cast as a demixing-remixi… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  8. arXiv:2310.03480  [pdf, other

    eess.AS cs.LG eess.SP

    The ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids

    Authors: Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca R. Vos, William M. Whitmer

    Abstract: This paper reports on the design and results of the 2024 ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The Cadenza project is working to enhance the audio quality of music for those with a hearing loss. The scenario for the challenge was listening to stereo reproduction over loudspeakers via hearing aids. The task was to: decompose pop/rock music into vocal, drums, bass an… ▽ More

    Submitted 29 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: 2-page paper for ICASSP 2024 SP Grand Challenge

  9. On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training

    Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

    Abstract: In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition. The designed training framework extends the existing mixture invariant training criterion to exploit both unpaired clean speech and real noisy data. It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech. The propo… ▽ More

    Submitted 20 September, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted to INTERSPEECH 2022

  10. arXiv:2204.04288  [pdf, other

    eess.AS cs.SD

    Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH2022

  11. arXiv:2204.04287  [pdf, other

    eess.AS cs.SD q-bio.QM

    Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: An accurate objective speech intelligibility prediction algorithms is of great interest for many applications such as speech enhancement for hearing aids. Most algorithms measures the signal-to-noise ratios or correlations between the acoustic features of clean reference signals and degraded signals. However, these hand-picked acoustic features are usually not explicitly correlated with recognitio… ▽ More

    Submitted 6 July, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH2022

  12. arXiv:2204.04284  [pdf, other

    eess.AS cs.SD

    Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition

    Authors: Zehai Tu, Jack Deadman, Ning Ma, Jon Barker

    Abstract: End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various heari… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  13. arXiv:2202.00011  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

    Authors: Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

    Abstract: Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: WACV 2024

  14. arXiv:2112.00556  [pdf, other

    cs.CV eess.IV

    Semi-Supervised Surface Anomaly Detection of Composite Wind Turbine Blades From Drone Imagery

    Authors: Jack. W. Barker, Neelanjan Bhowmik, Toby. P. Breckon

    Abstract: Within commercial wind energy generation, the monitoring and predictive maintenance of wind turbine blades in-situ is a crucial task, for which remote monitoring via aerial survey from an Unmanned Aerial Vehicle (UAV) is commonplace. Turbine blades are susceptible to both operational and weather-based damage over time, reducing the energy efficiency output of turbines. In this study, we address au… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: In-proceedings at 2022 17th International Conference on Computer Vision Theory and Applications (VISAPP)

  15. arXiv:2106.07843  [pdf, other

    cs.SD cs.CL eess.AS

    Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

    Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

    Abstract: In this paper, we introduce a novel semi-supervised learning framework for end-to-end speech separation. The proposed method first uses mixtures of unseparated sources and the mixture invariant training (MixIT) criterion to train a teacher model. The teacher model then estimates separated sources that are used to train a student model with standard permutation invariant training (PIT). The student… ▽ More

    Submitted 9 September, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  16. arXiv:2106.04639  [pdf, other

    cs.SD eess.AS

    Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model

    Authors: Zehai Tu, Ning Ma, Jon Barker

    Abstract: Current hearing aids normally provide amplification based on a general prescriptive fitting, and the benefits provided by the hearing aids vary among different listening environments despite the inclusion of noise suppression feature. Motivated by this fact, this paper proposes a data-driven machine learning technique to develop hearing aid fittings that are customised to speech in different noisy… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  17. arXiv:2102.10376  [pdf, other

    eess.AS cs.AI eess.SP

    The Use of Voice Source Features for Sung Speech Recognition

    Authors: Gerardo Roa Dabike, Jon Barker

    Abstract: In this paper, we ask whether vocal source features (pitch, shimmer, jitter, etc) can improve the performance of automatic sung speech recognition, arguing that conclusions previously drawn from spoken speech studies may not be valid in the sung speech domain. We first use a parallel singing/speaking corpus (NUS-48E) to illustrate differences in sung vs spoken voicing characteristics including pit… ▽ More

    Submitted 23 February, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: Accepted to ICASSP 2021

  18. Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

    Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

    Abstract: In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved multi-channel time-domain speech separation network which employs speaker embeddings to identify and extract multiple targets without label permutation ambiguity. To eff… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: Accepted for ICASSP 2021

    MSC Class: 68T10

  19. On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

    Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

    Abstract: This paper introduces a new method for multi-channel time domain speech separation in reverberant environments. A fully-convolutional neural network structure has been used to directly separate speech from multiple microphone recordings, with no need of conventional spatial feature extraction. To reduce the influence of reverberation on spatial feature extraction, a dereverberation pre-processing… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Presented at IEEE ICASSP 2020

    MSC Class: 68T10

    Journal ref: Proc. ICASSP (2020) 6389-6393

  20. arXiv:2006.11140  [pdf, other

    eess.AS

    Clarity: Machine Learning Challenges to Revolutionise Hearing Device Processing

    Authors: Simone Graetzer, Michael Akeroyd, Jon P. Barker, Trevor J. Cox, John F. Culling, Graham Naylor, Eszter Porter, Rhoddy Viveros Muñoz

    Abstract: In the Clarity project, we will run a series of machine learning challenges to revolutionise speech processing for hearing devices. Over five years, there will be three paired challenges. Each pair will consist of a competition focussed on hearing-device processing ("enhancement") and another focussed on speech perception modelling ("prediction"). The enhancement challenges will deliver new and im… ▽ More

    Submitted 17 August, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 3 pages, 2 figures

  21. arXiv:2004.09249  [pdf, other

    cs.SD cs.CL eess.AS

    CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

    Authors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

    Abstract: Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous C… ▽ More

    Submitted 2 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  22. arXiv:1911.08216  [pdf, other

    cs.CV cs.LG eess.IV

    On the Impact of Object and Sub-component Level Segmentation Strategies for Supervised Anomaly Detection within X-ray Security Imagery

    Authors: Neelanjan Bhowmik, Yona Falinie A. Gaus, Samet Akcay, Jack W. Barker, Toby P. Breckon

    Abstract: X-ray security screening is in widespread use to maintain transportation security against a wide range of potential threat profiles. Of particular interest is the recent focus on the use of automated screening approaches, including the potential anomaly detection as a methodology for concealment detection within complex electronic items. Here we address this problem considering varying segmentatio… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  23. arXiv:1808.00060  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

    Authors: Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker, Amir Hussain

    Abstract: Human auditory cortex excels at selectively suppressing background noise to focus on a target speaker. The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises. In this study, we propose a novel deep neural network (DNN) based audiovisual (AV) mask estimation model. The pr… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: Accepted for Interspeech 2018, 5 pages, 4 figures

    ACM Class: I.5; I.4; I.2

  24. arXiv:1803.10609  [pdf, ps, other

    cs.SD cs.AI eess.AS

    The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

    Authors: Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal

    Abstract: The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinne… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.