Skip to main content

Showing 1–5 of 5 results for author: Rivlin, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.16475  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Looks Too Good To Be True: An Information-Theoretic Analysis of Hallucinations in Generative Restoration Models

    Authors: Regev Cohen, Idan Kligvasser, Ehud Rivlin, Daniel Freedman

    Abstract: The pursuit of high perceptual quality in image restoration has driven the development of revolutionary generative models, capable of producing results often visually indistinguishable from real data. However, as their perceptual quality continues to improve, these models also exhibit a growing tendency to generate hallucinations - realistic-looking details that do not exist in the ground truth im… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  2. arXiv:2403.09920  [pdf

    eess.IV cs.AI cs.CV cs.CY

    Predicting Generalization of AI Colonoscopy Models to Unseen Data

    Authors: Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg

    Abstract: $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}… ▽ More

    Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2402.12423  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models

    Authors: Miri Varshavsky-Hassid, Roy Hirsch, Regev Cohen, Tomer Golany, Daniel Freedman, Ehud Rivlin

    Abstract: The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explor… ▽ More

    Submitted 4 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  4. arXiv:2305.15255  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

    Authors: Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, RJ Skerry-Ryan, Michelle Tadmor Ramanovich

    Abstract: We present Spectron, a novel approach to adapting pre-trained large language models (LLMs) to perform spoken question answering (QA) and speech continuation. By endowing the LLM with a pre-trained speech encoder, our model becomes able to take speech inputs and generate speech outputs. The entire system is trained end-to-end and operates directly on spectrograms, simplifying our architecture. Key… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024 camera-ready

  5. arXiv:2303.05737  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings

    Authors: Joel Shor, Ruyue Agnes Bi, Subhashini Venugopalan, Steven Ibara, Roman Goldenberg, Ehud Rivlin

    Abstract: Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penal… ▽ More

    Submitted 28 April, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Journal ref: Clinical NLP Workshop, ACL 2023