Skip to main content

Showing 1–19 of 19 results for author: Wilson, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.11151  [pdf, other

    cs.SD eess.AS

    Unsupervised Multi-channel Separation and Adaptation

    Authors: Cong Han, Kevin Wilson, Scott Wisdom, John R. Hershey

    Abstract: A key challenge in machine learning is to generalize from training data to an application domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlap** reverberant and noisy speech from the AMI Corpus. Th… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  2. arXiv:2304.04155  [pdf, other

    eess.IV cs.CV

    Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

    Authors: Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W. Remedios, Shunxing Bao, Bennett A. Landman, Lee E. Wheless, Lori A. Coburn, Keith T. Wilson, Yaohong Wang, Shilin Zhao, Agnes B. Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo

    Abstract: The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

  3. arXiv:2304.00216  [pdf, other

    eess.IV cs.CV cs.LG

    Cross-scale Multi-instance Learning for Pathological Image Diagnosis

    Authors: Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

    Abstract: Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnifica… ▽ More

    Submitted 16 February, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

  4. arXiv:2212.04549  [pdf, other

    cs.RO eess.SY

    Optimizing Real-Time Performances for Timed-Loop Racing under F1TENTH

    Authors: Nitish Gupta, Kurt Wilson, Zhishan Guo

    Abstract: Motion planning and control in autonomous car racing are one of the most challenging and safety-critical tasks due to high speed and dynamism. The lower-level control nodes are expected to be highly optimized due to resource constraints of onboard embedded processing units, although there are strict latency requirements. Some of these guarantees can be provided at the application level, such as us… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the 43rd IEEE Real-Time Systems Symposium (RTSS), Industry Challenge, Houston, US, Dec. 2022

  5. arXiv:2207.00562  [pdf, other

    cs.SD eess.AS

    Distance-Based Sound Separation

    Authors: Katharine Patterson, Kevin Wilson, Scott Wisdom, John R. Hershey

    Abstract: We propose the novel task of distance-based sound separation, where sounds are separated based only on their distance from a single microphone. In the context of assisted listening devices, proximity provides a simple criterion for sound selection in noisy environments that would allow the user to focus on sounds relevant to a local conversation. We demonstrate the feasibility of this approach by… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at Interspeech 2022

  6. arXiv:2109.09004  [pdf, other

    eess.IV cs.CV

    Random Multi-Channel Image Synthesis for Multiplexed Immunofluorescence Imaging

    Authors: Shunxing Bao, Yucheng Tang, Ho Hin Lee, Riqiang Gao, Sophie Chiron, Ilwoo Lyu, Lori A. Coburn, Keith T. Wilson, Joseph T. Roland, Bennett A. Landman, Yuankai Huo

    Abstract: Multiplex immunofluorescence (MxIF) is an emerging imaging technique that produces the high sensitivity and specificity of single-cell map**. With a tenet of 'seeing is believing', MxIF enables iterative staining and imaging extensive antibodies, which provides comprehensive biomarkers to segment and group different cells on a single tissue section. However, considerable depletion of the scarce… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted at the third MICCAI workshop on Computational Pathology (COMPAY 2021)

  7. arXiv:2106.14212  [pdf, other

    eess.SP

    A Joint Technique for Nonlinearity Compensation in CO-OFDM Superchannel Systems

    Authors: O. S. Sunish Kumar, A. Amari, O. A. Dobre, R. Venkatesan, S. K. Wilson

    Abstract: We propose a technique combining the singlechannel digital-back-propagation (SC-DBP) with phaseconjugated-twin-wave (PCTW) to compensate nonlinearities in CO-OFDM superchannel systems. This exhibits a similar performance as multi-channel DBP while providing increased transmission reach compared to SC-DBP, PCTW, and linear dispersion compensation (LDC).

    Submitted 27 June, 2021; originally announced June 2021.

  8. arXiv:2106.14205  [pdf, other

    eess.SP

    A Spectrally Efficient Linear Polarization Coding Scheme for Fiber Nonlinearity Compensation in CO-OFDM Systems

    Authors: O. S. Sunish Kumar, O. A. Dobre, R. Venkatesan, S. K. Wilson, O. Omomukuyo, A. Amari, D. Chang

    Abstract: In this paper, we propose a linear polarization coding scheme (LPC) combined with the phase conjugated twin signals (PCTS) technique, referred to as LPC-PCTS, for fiber nonlinearity mitigation in coherent optical orthogonal frequency division multiplexing (CO-OFDM) systems. The LPC linearly combines the data symbols on the adjacent subcarriers of the OFDM symbol, one at full amplitude and the othe… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  9. arXiv:2105.02096  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

    Authors: Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

    Abstract: We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers,… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 5 pages, 2 figures, ICASSP 2021

    Journal ref: ICASSP 2021, SPE-54.1

  10. arXiv:2009.04323  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

    Authors: Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein

    Abstract: We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance unde… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

  11. arXiv:2006.12701  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Sound Separation Using Mixture Invariant Training

    Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey

    Abstract: In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. Reliance on this synthetic training data is problematic because good performance depends upon… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  12. arXiv:1911.07953  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

    Authors: Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

    Abstract: This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function. For beamforming, we explore multiple ways of computing time-varying covariance matrices, incl… ▽ More

    Submitted 3 November, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: 7 pages, 7 figures, IEEE SLT 2021 (slt2020.org)

  13. arXiv:1908.01901  [pdf, other

    cs.LG eess.IV stat.ML

    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information

    Authors: Charles B. Delahunt, Mayoore S. Jaiswal, Matthew P. Horning, Samantha Janko, Clay M. Thompson, Sourabh Kulhare, Liming Hu, Travis Ostbye, Grace Yun, Roman Gebrehiwot, Benjamin K. Wilson, Earl Long, Stephane Proux, Dionicia Gamboa, Peter Chiodini, Jane Carter, Mehul Dhorda, David Isaboke, Bernhards Ogutu, Wellington Oyibo, Elizabeth Villasis, Kyaw Myo Tun, Christine Bachman, David Bell, Courosh Mehanian

    Abstract: Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumb… ▽ More

    Submitted 11 September, 2022; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: 16 pages, 13 figures

    MSC Class: 68T10 ACM Class: I.5.0

  14. arXiv:1905.03330  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Universal Sound Separation

    Authors: Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey

    Abstract: Recent deep learning approaches have achieved impressive performance on speech enhancement and separation tasks. However, these approaches have not been investigated for separating mixtures of arbitrary sounds of different types, a task we refer to as universal sound separation, and it is unknown how performance on speech tasks carries over to non-speech tasks. To study this question, we develop a… ▽ More

    Submitted 2 August, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: 5 pages, accepted to WASPAA 2019

  15. arXiv:1811.08521  [pdf, other

    cs.SD eess.AS

    Differentiable Consistency Constraints for Improved Deep Speech Enhancement

    Authors: Scott Wisdom, John R. Hershey, Kevin Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous

    Abstract: In recent years, deep networks have led to dramatic improvements in speech enhancement by framing it as a data-driven pattern recognition problem. In many modern enhancement systems, large amounts of data are used to train a deep network to estimate masks for complex-valued short-time Fourier transforms (STFTs) to suppress noise and preserve speech. However, current masking approaches often neglec… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  16. arXiv:1811.07030  [pdf, other

    cs.SD eess.AS

    Exploring Tradeoffs in Models for Low-latency Speech Enhancement

    Authors: Kevin Wilson, Michael Chinen, Jeremy Thorpe, Brian Patton, John Hershey, Rif A. Saurous, Jan Skoglund, Richard F. Lyon

    Abstract: We explore a variety of neural networks configurations for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on previous state-of-the-art performance on the CHiME2 speech enhancement task by 0.4 decibels in signal-to-distortion ratio (SDR). We examine trade-offs such as non-causal look-ahead, computation, and parameter count versus enhancement performance and… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

  17. arXiv:1810.04826  [pdf, other

    eess.AS cs.LG eess.SP stat.ML

    VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

    Authors: Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno

    Abstract: In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embe… ▽ More

    Submitted 19 June, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: To appear in Interspeech 2019

  18. arXiv:1808.00606  [pdf, other

    cs.SD eess.AS

    AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

    Authors: Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

    Abstract: Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or… ▽ More

    Submitted 23 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Interspeech, 2018

  19. arXiv:1804.03619  [pdf, other

    cs.SD cs.CV eess.AS

    Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

    Authors: Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein

    Abstract: We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and aud… ▽ More

    Submitted 9 August, 2018; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: Accepted to SIGGRAPH 2018. Project webpage: https://looking-to-listen.github.io

    Journal ref: ACM Trans. Graph. 37(4): 112:1-112:11 (2018)