Skip to main content

Showing 1–35 of 35 results for author: Keshet, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18928  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network

    Authors: Yehoshua Dissen, Shiry Yonash, Israel Cohen, Joseph Keshet

    Abstract: In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study is focused on recovering from packet loss to improve the word error rate (WER) of ASR models. We propose using a front-end adaptation network connected… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted for publication at INTERSPEECH 2024

  2. arXiv:2406.02649  [pdf, other

    eess.AS cs.LG cs.SD

    Keyword-Guided Adaptation of Automatic Speech Recognition

    Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

    Abstract: Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to InterSpeech 2024

  3. arXiv:2310.19708  [pdf, other

    cs.CL cs.LG

    Combining Language Models For Specialized Domains: A Colorful Approach

    Authors: Daniel Eitan, Menachem Pirchi, Neta Glazer, Shai Meital, Gil Ayach, Gidon Krendel, Aviv Shamsian, Aviv Navon, Gil Hetz, Joseph Keshet

    Abstract: General purpose language models (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging to interpret mixed speech that blends general language with specialized jargon. This poses a challenge for automatic speech recognition systems opera… ▽ More

    Submitted 1 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Under Review

  4. arXiv:2310.01381  [pdf, other

    cs.SD cs.CL eess.AS

    DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

    Authors: Roi Benita, Michael Elad, Joseph Keshet

    Abstract: Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, g… ▽ More

    Submitted 10 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  5. arXiv:2309.08561  [pdf, other

    eess.AS cs.LG cs.SD

    Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

    Authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

    Abstract: Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encod… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Under Review

  6. arXiv:2207.05418  [pdf, other

    cs.CV cs.LG

    A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

    Authors: Gabi Shalev, Gal-Lev Shalev, Joseph Keshet

    Abstract: Image captioning research achieved breakthroughs in recent years by develo** neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribution (OOD) images, such as corrupted images, or images containing unknown objects, the models fail in generating relevant captions. In this paper, we… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM Multimedia (MM) 2022

  7. arXiv:2206.14639  [pdf, other

    eess.AS cs.LG cs.SD

    DDKtor: Automatic Diadochokinetic Speech Analysis

    Authors: Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet

    Abstract: Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unanno… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  8. arXiv:2206.11632  [pdf, other

    cs.SD eess.AS

    Formant Estimation and Tracking using Probabilistic Heat-Maps

    Authors: Yosi Shrem, Felix Kreuk, Joseph Keshet

    Abstract: Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: interspeech 2022

  9. arXiv:2205.04864  [pdf, other

    cs.LG

    THOR: Threshold-Based Ranking Loss for Ordinal Regression

    Authors: Tzeviya Sylvia Fuchs, Joseph Keshet

    Abstract: In this work, we present a regression-based ordinal regression algorithm for supervised classification of instances into ordinal categories. In contrast to previous methods, in this work the decision boundaries between categories are predefined, and the algorithm learns to project the input examples onto their appropriate scores according to these predefined boundaries. This is achieved by adding… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  10. arXiv:2204.13094  [pdf, other

    cs.SD eess.AS

    Unsupervised Word Segmentation using K Nearest Neighbors

    Authors: Tzeviya Sylvia Fuchs, Yedid Hoshen, Joseph Keshet

    Abstract: In this paper, we propose an unsupervised kNN-based approach for word segmentation in speech utterances. Our method relies on self-supervised pre-trained speech representations, and compares each audio segment of a given utterance to its K nearest neighbors within the training set. Our main assumption is that a segment containing more than one word would occur less often than a segment containing… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Submitted to interspeech 2022

  11. arXiv:2204.04166  [pdf, other

    cs.SD cs.LG eess.AS

    Self-supervised Speaker Diarization

    Authors: Yehoshua Dissen, Felix Kreuk, Joseph Keshet

    Abstract: Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised d… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  12. arXiv:2204.03379  [pdf, ps, other

    eess.AS cs.LG

    Correcting Mispronunciations in Speech using Spectrogram Inpainting

    Authors: Talia Ben-Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph Keshet

    Abstract: Learning a new language involves constantly comparing speech productions with reference productions from the environment. Early in speech acquisition, children make articulatory adjustments to match their caregivers' speech. Grownup learners of a language tweak their speech to match the tutor reference. This paper proposes a method to synthetically generate correct pronunciation feedback given inc… ▽ More

    Submitted 30 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted for publication at Interspeech 2022

  13. arXiv:2203.17019  [pdf, other

    eess.AS cs.LG cs.SD

    DeepFry: Identifying Vocal Fry Using Deep Neural Networks

    Authors: Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet

    Abstract: Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly f… ▽ More

    Submitted 26 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2022

  14. arXiv:2103.08265  [pdf, other

    cs.LG

    Constant Random Perturbations Provide Adversarial Robustness with Minimal Effect on Accuracy

    Authors: Bronya Roni Chernyak, Bhiksha Raj, Tamir Hazan, Joseph Keshet

    Abstract: This paper proposes an attack-independent (non-adversarial training) technique for improving adversarial robustness of neural network models, with minimal loss of standard accuracy. We suggest creating a neighborhood around each training example, such that the label is kept constant for all inputs within that neighborhood. Unlike previous work that follows a similar principle, we apply this idea b… ▽ More

    Submitted 27 July, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: Workshop of RobustML ICLR 2021

  15. arXiv:2103.05468  [pdf, other

    eess.AS cs.LG cs.SD

    CNN-based Spoken Term Detection and Localization without Dynamic Programming

    Authors: Tzeviya Sylvia Fuchs, Yael Segal, Joseph Keshet

    Abstract: In this paper, we propose a spoken term detection algorithm for simultaneous prediction and localization of in-vocabulary and out-of-vocabulary terms within an audio segment. The proposed algorithm infers whether a term was uttered within a given speech signal or not by predicting the word embeddings of various parts of the speech signal and comparing them to the word embedding of the desired term… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2021

  16. arXiv:2011.08704  [pdf, other

    cs.LG

    Redesigning the classification layer by randomizing the class representation vectors

    Authors: Gabi Shalev, Gal-Lev Shalev, Joseph Keshet

    Abstract: Neural image classification models typically consist of two components. The first is an image encoder, which is responsible for encoding a given raw image into a representative vector. The second is the classification component, which is often implemented by projecting the representative vector onto target class vectors. The target class vectors, along with the rest of the model parameters, are es… ▽ More

    Submitted 29 November, 2020; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: 11 pages, 9 tables, 5 figures

  17. arXiv:2009.01534  [pdf, other

    cs.AI cs.CR cs.LG stat.ML

    Fairness in the Eyes of the Data: Certifying Machine-Learning Models

    Authors: Shahar Segal, Yossi Adi, Benny Pinkas, Carsten Baum, Chaya Ganesh, Joseph Keshet

    Abstract: We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available… ▽ More

    Submitted 25 June, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted to AIES-2021

  18. arXiv:2007.13465  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

    Authors: Felix Kreuk, Joseph Keshet, Yossi Adi

    Abstract: We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce t… ▽ More

    Submitted 6 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Comments: Interspeech 2020 paper

  19. arXiv:2002.04992  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Phoneme Boundary Detection using Learnable Segmental Features

    Authors: Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi

    Abstract: Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken p… ▽ More

    Submitted 16 February, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

  20. arXiv:1910.13255  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Dr.VOT : Measuring Positive and Negative Voice Onset Time in the Wild

    Authors: Yosi Shrem, Matthew Goldrick, Joseph Keshet

    Abstract: Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate and reliable measurement of VOT in naturalistic spe… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: interspeech 2019

    Journal ref: interspeech 2019

  21. arXiv:1904.07704  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    SpeechYOLO: Detection and Localization of Speech Objects

    Authors: Yael Segal, Tzeviya Sylvia Fuchs, Joseph Keshet

    Abstract: In this paper, we propose to apply object detection methods from the vision domain on the speech recognition domain, by treating audio fragments as objects. More specifically, we present SpeechYOLO, which is inspired by the YOLO algorithm for object detection in images. The goal of SpeechYOLO is to localize boundaries of utterances within the input signal, and to correctly classify them. Our syste… ▽ More

    Submitted 30 June, 2019; v1 submitted 14 April, 2019; originally announced April 2019.

    Journal ref: Interspeech 2019, pp. 4210-4214

  22. arXiv:1902.03083  [pdf, other

    cs.SD cs.CR cs.LG eess.AS stat.ML

    Hide and Speak: Towards Deep Neural Networks for Speech Steganography

    Authors: Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

    Abstract: Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. We showed that steganography models proposed for… ▽ More

    Submitted 27 July, 2020; v1 submitted 7 February, 2019; originally announced February 2019.

  23. arXiv:1808.06664  [pdf, ps, other

    stat.ML cs.LG

    Out-of-Distribution Detection using Multiple Semantic Label Representations

    Authors: Gabi Shalev, Yossi Adi, Joseph Keshet

    Abstract: Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural network… ▽ More

    Submitted 10 January, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

  24. arXiv:1802.04633  [pdf, ps, other

    cs.LG

    Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

    Authors: Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, Joseph Keshet

    Abstract: Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoi… ▽ More

    Submitted 11 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

  25. arXiv:1802.04528  [pdf, other

    cs.LG cs.CR

    Deceiving End-to-End Deep Learning Malware Detectors using Adversarial Examples

    Authors: Felix Kreuk, Assi Barak, Shir Aviv-Reuven, Moran Baruch, Benny Pinkas, Joseph Keshet

    Abstract: In recent years, deep learning has shown performance breakthroughs in many applications, such as image detection, image segmentation, pose estimation, and speech recognition. However, this comes with a major concern: deep networks have been found to be vulnerable to adversarial examples. Adversarial examples are slightly modified inputs that are intentionally designed to cause a misclassification… ▽ More

    Submitted 10 January, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

  26. arXiv:1801.03339  [pdf, other

    cs.LG cs.CL

    Fooling End-to-end Speaker Verification by Adversarial Examples

    Authors: Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet

    Abstract: Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attack. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in suc… ▽ More

    Submitted 16 February, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

  27. arXiv:1707.05373  [pdf, other

    stat.ML cs.AI cs.CR cs.CV cs.LG

    Houdini: Fooling Deep Structured Prediction Models

    Authors: Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet

    Abstract: Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance meas… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

    Comments: 12 pages, 8 figures, under review

  28. arXiv:1704.01653  [pdf, other

    cs.CL

    Automatic Measurement of Pre-aspiration

    Authors: Yaniv Sheena, Míša Hejná, Yossi Adi, Joseph Keshet

    Abstract: Pre-aspiration is defined as the period of glottal friction occurring in sequences of vocalic/consonantal sonorants and phonetically voiceless obstruents. We propose two machine learning methods for automatic measurement of pre-aspiration duration: a feedforward neural network, which works at the frame level; and a structured prediction model, which relies on manually designed feature functions, a… ▽ More

    Submitted 15 June, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

  29. arXiv:1703.09817  [pdf, other

    cs.CL

    Learning Similarity Functions for Pronunciation Variations

    Authors: Einat Naaman, Yossi Adi, Joseph Keshet

    Abstract: A significant source of errors in Automatic Speech Recognition (ASR) systems is due to pronunciation variations which occur in spontaneous and conversational speech. Usually ASR systems use a finite lexicon that provides one or more pronunciations for each word. In this paper, we focus on learning a similarity function between two pronunciations. The pronunciations can be the canonical and the sur… ▽ More

    Submitted 18 June, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

  30. arXiv:1611.01783  [pdf, other

    cs.CL cs.SD

    Domain Adaptation For Formant Estimation Using Deep Learning

    Authors: Yehoshua Dissen, Joseph Keshet, Jacob Goldberger, Cynthia Clopper

    Abstract: In this paper we present a domain adaptation technique for formant estimation using a deep network. We first train a deep learning network on a small read speech dataset. We then freeze the parameters of the trained network and use several different datasets to train an adaptation layer that makes the obtained network universal in the sense that it works well for a variety of speakers and speech d… ▽ More

    Submitted 6 November, 2016; originally announced November 2016.

  31. arXiv:1610.08166  [pdf, other

    stat.ML cs.LG cs.SD

    Automatic measurement of vowel duration via structured prediction

    Authors: Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick

    Abstract: A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing… ▽ More

    Submitted 26 October, 2016; originally announced October 2016.

  32. arXiv:1610.07918  [pdf, other

    cs.CL

    Sequence Segmentation Using Joint RNN and Structured Prediction Models

    Authors: Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

    Abstract: We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

    Comments: under review

  33. arXiv:1512.07851  [pdf, other

    cs.LG

    Context-Based Prediction of App Usage

    Authors: Joseph Keshet, Adam Kariv, Arnon Dagan, Dvir Volk, Joey Simhon

    Abstract: There are around a hundred installed apps on an average smartphone. The high number of apps and the limited number of app icons that can be displayed on the device's screen requires a new paradigm to address their visibility to the user. In this paper we propose a new online algorithm for dynamically predicting a set of apps that the user is likely to use. The algorithm runs on the user's device a… ▽ More

    Submitted 25 January, 2016; v1 submitted 24 December, 2015; originally announced December 2015.

  34. arXiv:1512.02033  [pdf, ps, other

    cs.LG

    Risk Minimization in Structured Prediction using Orbit Loss

    Authors: Danny Karmon, Joseph Keshet

    Abstract: We introduce a new surrogate loss function called orbit loss in the structured prediction framework, which has good theoretical and practical advantages. While the orbit loss is not convex, it has a simple analytical gradient and a simple perceptron-like learning rule. We analyze the new loss theoretically and state a PAC-Bayesian generalization bound. We also prove that the new loss is consistent… ▽ More

    Submitted 9 December, 2015; v1 submitted 7 December, 2015; originally announced December 2015.

  35. arXiv:1109.4603  [pdf, other

    cs.AI

    Explicit Approximations of the Gaussian Kernel

    Authors: Andrew Cotter, Joseph Keshet, Nathan Srebro

    Abstract: We investigate training and using Gaussian kernel SVMs by approximating the kernel with an explicit finite- dimensional polynomial feature representation based on the Taylor expansion of the exponential. Although not as efficient as the recently-proposed random Fourier features [Rahimi and Recht, 2007] in terms of the number of features, we show how this polynomial representation can provide a bet… ▽ More

    Submitted 21 September, 2011; originally announced September 2011.

    Comments: 11 pages, 2 tables, 2 figures