Skip to main content

Showing 1–5 of 5 results for author: Segal, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2206.14639  [pdf, other

    eess.AS cs.LG cs.SD

    DDKtor: Automatic Diadochokinetic Speech Analysis

    Authors: Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet

    Abstract: Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unanno… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  2. arXiv:2203.17019  [pdf, other

    eess.AS cs.LG cs.SD

    DeepFry: Identifying Vocal Fry Using Deep Neural Networks

    Authors: Bronya R. Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer S. Cole, Joseph Keshet

    Abstract: Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly f… ▽ More

    Submitted 26 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2022

  3. arXiv:2103.05468  [pdf, other

    eess.AS cs.LG cs.SD

    CNN-based Spoken Term Detection and Localization without Dynamic Programming

    Authors: Tzeviya Sylvia Fuchs, Yael Segal, Joseph Keshet

    Abstract: In this paper, we propose a spoken term detection algorithm for simultaneous prediction and localization of in-vocabulary and out-of-vocabulary terms within an audio segment. The proposed algorithm infers whether a term was uttered within a given speech signal or not by predicting the word embeddings of various parts of the speech signal and comparing them to the word embedding of the desired term… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2021

  4. arXiv:2002.10439  [pdf

    eess.IV

    Improvements of Motion Estimation and Coding using Neural Networks

    Authors: Raz Birman, Yoram Segal, Ofer Hadar, Jenny Benois-Pineau

    Abstract: Inter-Prediction is used effectively in multiple standards, including H.264 and HEVC (also known as H.265). It leverages correlation between blocks of consecutive video frames in order to perform motion compensation and thus predict block pixel values and reduce transmission bandwidth. In order to reduce the magnitude of the transmitted Motion Vector (MV) and thus reduce bandwidth, the encoder uti… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: 11 pages, 9 figures, Submitted to IEEE Transactions on Circuits and Systems for Video Technology

  5. arXiv:1904.07704  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    SpeechYOLO: Detection and Localization of Speech Objects

    Authors: Yael Segal, Tzeviya Sylvia Fuchs, Joseph Keshet

    Abstract: In this paper, we propose to apply object detection methods from the vision domain on the speech recognition domain, by treating audio fragments as objects. More specifically, we present SpeechYOLO, which is inspired by the YOLO algorithm for object detection in images. The goal of SpeechYOLO is to localize boundaries of utterances within the input signal, and to correctly classify them. Our syste… ▽ More

    Submitted 30 June, 2019; v1 submitted 14 April, 2019; originally announced April 2019.

    Journal ref: Interspeech 2019, pp. 4210-4214