Search | arXiv e-print repository

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

Authors: Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

Abstract: This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild". The challenge consisted of: (i) the provision of publicly available speaker re… ▽ More This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022. The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild". The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and hybrid workshop held at INTERSPEECH 2022. We describe the four tracks of our challenge along with the baselines, methods, and results. We conclude with a discussion on the new domain-transfer focus of VoxSRC-22, and on the progression of the challenge from the previous three editions. △ Less

Submitted 6 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2210.14682 [pdf, other]

In search of strong embedding extractors for speaker diarisation

Authors: Jee-weon Jung, Hee-Soo Heo, Bong-** Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung

Abstract: Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and… ▽ More Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation. We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance. Second, embedding extractors have not seen utterances in which multiple speakers exist. These inputs are inevitably present in speaker diarisation because of overlapped speech and speaker changes; they degrade the performance. To mitigate the first problem, we generate speaker verification evaluation protocols that mimic the diarisation scenario better. We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input. One technique generates overlapped speech segments, and the other generates segments where two speakers utter sequentially. Extensive experimental results using three state-of-the-art speaker embedding extractors demonstrate that both proposed approaches are effective. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 5pages, 1 figure, 2 tables, submitted to ICASSP

arXiv:2201.04583 [pdf, other]

VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge

Authors: Andrew Brown, Jaesung Huh, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

Abstract: The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unconstrained or `in the wild' data. The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from Yo… ▽ More The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unconstrained or `in the wild' data. The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2021. This paper outlines the challenge, and describes the baselines, methods and results. We conclude with a discussion on the new multi-lingual focus of VoxSRC 2021, and on the progression of the challenge since the previous two editions. △ Less

Submitted 16 November, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2012.06867

arXiv:2012.06867 [pdf, other]

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge

Authors: Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

Abstract: We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together… ▽ More We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2020. This paper outlines the challenge, and describes the baselines, methods used, and results. We conclude with a discussion of the progress over the first installment of the challenge. △ Less

Submitted 12 December, 2020; originally announced December 2020.

arXiv:2010.15716 [pdf, other]

Playing a Part: Speaker Verification at the Movies

Authors: Andrew Brown, Jaesung Huh, Arsha Nagrani, Joon Son Chung, Andrew Zisserman

Abstract: The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies con… ▽ More The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies contains utterances with varying emotion, accents and background noise, and therefore comprises an entirely different domain to the interview-style, emotionally calm utterances in current speaker recognition datasets such as VoxCeleb; (ii) We provide a number of domain adaptation evaluation sets, and benchmark the performance of state-of-the-art speaker recognition models on these evaluation pairs. We demonstrate that both speaker verification and identification performance drops steeply on this new data, showing the challenge in transferring models across domains; and finally (iii) We show that simple domain adaptation paradigms improve performance, but there is still large room for improvement. △ Less

Submitted 11 February, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

Comments: The first three authors contributed equally to this work

arXiv:2005.11712 [pdf, ps, other]

A beamforming approach to the self-calibration of phased arrays

Authors: Quentin Gueuning, Antony Brown, Christophe Craeye, Eloy de Lera Acedo

Abstract: In this paper, we propose a beamforming method for the calibration of the direction-independent gain of the analog chains of aperture arrays. The gain estimates are obtained by cross-correlating the output voltage of each antenna with a voltage beamformed using the other antennas of the array. When the beamforming weights are equal to the average cross-correlated power, a relation is drawn with th… ▽ More In this paper, we propose a beamforming method for the calibration of the direction-independent gain of the analog chains of aperture arrays. The gain estimates are obtained by cross-correlating the output voltage of each antenna with a voltage beamformed using the other antennas of the array. When the beamforming weights are equal to the average cross-correlated power, a relation is drawn with the StEFCal algorithm. An example illustrates this approach for few point sources and a 256-element array. △ Less

Submitted 24 May, 2020; originally announced May 2020.

arXiv:1911.07372 [pdf, other]

Deep Learning for the Digital Pathologic Diagnosis of Cholangiocarcinoma and Hepatocellular Carcinoma: Evaluating the Impact of a Web-based Diagnostic Assistant

Authors: Bora Uyumazturk, Amirhossein Kiani, Pranav Rajpurkar, Alex Wang, Robyn L. Ball, Rebecca Gao, Yifan Yu, Erik Jones, Curtis P. Langlotz, Brock Martin, Gerald J. Berry, Michael G. Ozawa, Florette K. Hazard, Ryanne A. Brown, Simon B. Chen, Mona Wood, Libby S. Allard, Lourdes Ylagan, Andrew Y. Ng, Jeanne Shen

Abstract: While artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, the question of how best to incorporate these algorithms into clinical workflows remains relatively unexplored. We investigated how AI can affect pathologist performance on the task of differentiating between two subtypes of primary liver cancer, hepatocellular carcinoma (HCC) and chol… ▽ More While artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, the question of how best to incorporate these algorithms into clinical workflows remains relatively unexplored. We investigated how AI can affect pathologist performance on the task of differentiating between two subtypes of primary liver cancer, hepatocellular carcinoma (HCC) and cholangiocarcinoma (CC). We developed an AI diagnostic assistant using a deep learning model and evaluated its effect on the diagnostic performance of eleven pathologists with varying levels of expertise. Our deep learning model achieved an accuracy of 0.885 on an internal validation set of 26 slides and an accuracy of 0.842 on an independent test set of 80 slides. Despite having high accuracy on a hold out test set, the diagnostic assistant did not significantly improve performance across pathologists (p-value: 0.184, OR: 1.287 (95% CI 0.886, 1.871)). Model correctness was observed to significantly bias the pathologist decisions. When the model was correct, assistance significantly improved accuracy across all pathologist experience levels and for all case difficulty levels (p-value: < 0.001, OR: 4.289 (95% CI 2.360, 7.794)). When the model was incorrect, assistance significantly decreased accuracy across all 11 pathologists and for all case difficulty levels (p-value < 0.001, OR: 0.253 (95% CI 0.126, 0.507)). Our results highlight the challenges of translating AI models to the clinical setting, especially for difficult subspecialty tasks such as tumor classification. In particular, they suggest that incorrect model predictions could strongly bias an expert's diagnosis, an important factor to consider when designing medical AI-assistance systems. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1911.00625 [pdf]

Automated Inline Analysis of Myocardial Perfusion MRI with Deep Learning

Authors: Hui Xue, Rhodri Davies, Louis AE Brown, Kristopher D Knott, Tushar Kotecha, Marianna Fontana, Sven Plein, James C Moon, Peter Kellman

Abstract: Recent development of quantitative myocardial blood flow (MBF) map** allows direct evaluation of absolute myocardial perfusion, by computing pixel-wise flow maps. Clinical studies suggest quantitative evaluation would be more desirable for objectivity and efficiency. Objective assessment can be further facilitated by segmenting the myocardium and automatically generating reports following the AH… ▽ More Recent development of quantitative myocardial blood flow (MBF) map** allows direct evaluation of absolute myocardial perfusion, by computing pixel-wise flow maps. Clinical studies suggest quantitative evaluation would be more desirable for objectivity and efficiency. Objective assessment can be further facilitated by segmenting the myocardium and automatically generating reports following the AHA model. This will free user interaction for analysis and lead to a 'one-click' solution to improve workflow. This paper proposes a deep neural network based computational workflow for inline myocardial perfusion analysis. Adenosine stress and rest perfusion scans were acquired from three hospitals. Training set included N=1,825 perfusion series from 1,034 patients. Independent test set included 200 scans from 105 patients. Data were consecutively acquired at each site. A convolution neural net (CNN) model was trained to provide segmentation for LV cavity, myocardium and right ventricular by processing incoming 2D+T perfusion Gd series. Model outputs were compared to manual ground-truth for accuracy of segmentation and flow measures derived on global and per-sector basis. The trained models were integrated onto MR scanners for effective inference. Segmentation accuracy and myocardial flow measures were compared between CNN models and manual ground-truth. The mean Dice ratio of CNN derived myocardium was 0.93 +/- 0.04. Both global flow and per-sector values showed no significant difference, compared to manual results. The AHA 16 segment model was automatically generated and reported on the MR scanner. As a result, the fully automated analysis of perfusion flow map** was achieved. This solution was integrated on the MR scanner, enabling 'one-click' analysis and reporting of myocardial blood flow. △ Less

Submitted 29 May, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

Comments: This work has been submitted to Radiology: Artificial Intelligence for possible publication

arXiv:1910.07119 [pdf]

doi 10.1002/mrm.27954

Automatic In-line Quantitative Myocardial Perfusion Map**: processing algorithm and implementation

Authors: Hui Xue, Louise A. E. Brown, Sonia Nielles-Vallespin, Sven Plein, Peter Kellman

Abstract: Quantitative myocardial perfusion map** has advantages over qualitative assessment, including the ability to detect global flow reduction. However, it is not clinically available and remains as a research tool. Building upon the previously described imaging sequence, this paper presents algorithm and implementation of an automated solution for inline perfusion flow map** with step by step perf… ▽ More Quantitative myocardial perfusion map** has advantages over qualitative assessment, including the ability to detect global flow reduction. However, it is not clinically available and remains as a research tool. Building upon the previously described imaging sequence, this paper presents algorithm and implementation of an automated solution for inline perfusion flow map** with step by step performance characterization. An inline perfusion flow map** workflow is proposed and demonstrated on normal volunteers. Initial evaluation demonstrates the fully automated proposed solution for the respiratory motion correction, AIF LV mask detection and pixel-wise map**, from free-breathing myocardial perfusion imaging. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Journal ref: Magnetic Resonance in Medicine. 2019

arXiv:1809.00037 [pdf, ps, other]

Estimation for Quadrotors

Authors: Stefanie Tellex, Andy Brown, Sergei Lupashin

Abstract: This document describes standard approaches for filtering and estimation for quadrotors, created for the Udacity Flying Cars course. We assume previous knowledge of probability and some knowledge of linear algebra. We do not assume previous knowledge of Kalman filters or Bayes filters. This document derives an EKF for various models of drones in 1D, 2D, and 3D. We use the EKF and notation as defin… ▽ More This document describes standard approaches for filtering and estimation for quadrotors, created for the Udacity Flying Cars course. We assume previous knowledge of probability and some knowledge of linear algebra. We do not assume previous knowledge of Kalman filters or Bayes filters. This document derives an EKF for various models of drones in 1D, 2D, and 3D. We use the EKF and notation as defined in Thrun et al. [13]. We also give pseudocode for the Bayes filter, the EKF, and the Unscented Kalman filter [14]. The motivation behind this document is the lack of a step-by-step EKF tutorial that provides the derivations for a quadrotor helicopter. The goal of estimation is to infer the drone's state (pose, velocity, acceleration, and biases) from its sensor values and control inputs. This problem is challenging because sensors are noisy. Additionally, because of weight and cost issues, many drones have limited on-board computation so we want to estimate these values as quickly as possible. The standard method for performing this method is the Extended Kalman filter, a nonlinear extension of the Kalman filter which linearizes a nonlinear transition and measurement model around the current state. However the Unscented Kalman filter is better in almost every respect: simpler to implement, more accurate to estimate, and comparable runtimes. △ Less

Submitted 31 August, 2018; originally announced September 2018.

arXiv:1808.07992 [pdf]

Undersampling and Bagging of Decision Trees in the Analysis of Cardiorespiratory Behavior for the Prediction of Extubation Readiness in Extremely Preterm Infants

Authors: Lara J. Kanbar, Charles C. Onu, Wissam Shalish, Karen A. Brown, Guilherme M. Sant'Anna, Robert E. Kearney, Doina Precup

Abstract: Extremely preterm infants often require endotracheal intubation and mechanical ventilation during the first days of life. Due to the detrimental effects of prolonged invasive mechanical ventilation (IMV), clinicians aim to extubate infants as soon as they deem them ready. Unfortunately, existing strategies for prediction of extubation readiness vary across clinicians and institutions, and lead to… ▽ More Extremely preterm infants often require endotracheal intubation and mechanical ventilation during the first days of life. Due to the detrimental effects of prolonged invasive mechanical ventilation (IMV), clinicians aim to extubate infants as soon as they deem them ready. Unfortunately, existing strategies for prediction of extubation readiness vary across clinicians and institutions, and lead to high reintubation rates. We present an approach using Random Forest classifiers for the analysis of cardiorespiratory variability to predict extubation readiness. We address the issue of data imbalance by employing random undersampling of examples from the majority class before training each Decision Tree in a bag. By incorporating clinical domain knowledge, we further demonstrate that our classifier could have identified 71% of infants who failed extubation, while maintaining a success detection rate of 78%. △ Less

Submitted 23 August, 2018; originally announced August 2018.

Comments: Published in: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

arXiv:1808.07991 [pdf, other]

Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing

Authors: Charles C. Onu, Lara J. Kanbar, Wissam Shalish, Karen A. Brown, Guilherme M. Sant'Anna, Robert E. Kearney, Doina Precup

Abstract: Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical ventilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at develo** an accurate predictor of extubation readiness, Markov an… ▽ More Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical ventilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at develo** an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust time-series modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84\% of infants who failed extubation could have been accurately identified prior to extubation. △ Less

Submitted 23 August, 2018; originally announced August 2018.

Comments: Published in: 2017 IEEE Symposium Series on Computational Intelligence (SSCI)

arXiv:1808.07989 [pdf, ps, other]

A Semi-Markov Chain Approach to Modeling Respiratory Patterns Prior to Extubation in Preterm Infants

Authors: Charles C. Onu, Lara J. Kanbar, Wissam Shalish, Karen A. Brown, Guilherme M. Sant'Anna, Robert E. Kearney, Doina Precup

Abstract: After birth, extremely preterm infants often require specialized respiratory management in the form of invasive mechanical ventilation (IMV). Protracted IMV is associated with detrimental outcomes and morbidities. Premature extubation, on the other hand, would necessitate reintubation which is risky, technically challenging and could further lead to lung injury or disease. We present an approach t… ▽ More After birth, extremely preterm infants often require specialized respiratory management in the form of invasive mechanical ventilation (IMV). Protracted IMV is associated with detrimental outcomes and morbidities. Premature extubation, on the other hand, would necessitate reintubation which is risky, technically challenging and could further lead to lung injury or disease. We present an approach to modeling respiratory patterns of infants who succeeded extubation and those who required reintubation which relies on Markov models. We compare the use of traditional Markov chains to semi-Markov models which emphasize cross-pattern transitions and timing information, and to multi-chain Markov models which can concisely represent non-stationarity in respiratory behavior over time. The models we developed expose specific, unique similarities as well as vital differences between the two populations. △ Less

Submitted 23 August, 2018; originally announced August 2018.

Comments: Published in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

arXiv:1802.00535 [pdf, other]

Scalable Preprocessing of High Volume Bird Acoustic Data

Authors: Alexander Brown, Saurabh Garg, James Montgomery

Abstract: In this work, we examine the problem of efficiently preprocessing high volume bird acoustic data. We combine several existing preprocessing steps including noise reduction approaches into a single efficient pipeline by examining each process individually. We then utilise a distributed computing architecture to improve execution time. Using a master-slave model with data parallelisation, we develop… ▽ More In this work, we examine the problem of efficiently preprocessing high volume bird acoustic data. We combine several existing preprocessing steps including noise reduction approaches into a single efficient pipeline by examining each process individually. We then utilise a distributed computing architecture to improve execution time. Using a master-slave model with data parallelisation, we developed a near-linear automated scalable system, capable of preprocessing bird acoustic recordings 21.76 times faster with 32 cores over 8 virtual machines, compared to a serial process. This work contributes to the research area of bioacoustic analysis, which is currently very active because of its potential to monitor animals quickly at low cost. Overcoming noise interference is a significant challenge in many bioacoustic studies, and the volume of data in these studies is increasing. Our work makes large scale bird acoustic analyses more feasible by parallelising important bird acoustic processing tasks to significantly reduce execution times. △ Less

Submitted 1 February, 2018; originally announced February 2018.

Comments: 28 pages, 20 figures

Showing 1–14 of 14 results for author: Brown, A