Skip to main content

Showing 1–24 of 24 results for author: Remez, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.00725  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

    Authors: Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi

    Abstract: It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as ru… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  2. arXiv:2401.04577  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Masked Audio Generation using a Single Non-Autoregressive Transformer

    Authors: Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. T… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  3. arXiv:2308.12950  [pdf, other

    cs.CL

    Code Llama: Open Foundation Models for Code

    Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, **gyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

    Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  4. arXiv:2308.05725  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

    Authors: Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

    Abstract: Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthes… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  5. arXiv:2306.05284  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Simple and Controllable Music Generation

    Authors: Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

    Abstract: We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchicall… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published at Neurips 2023

  6. arXiv:2305.13009  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Textually Pretrained Speech Language Models

    Authors: Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

    Abstract: Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de… ▽ More

    Submitted 30 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  7. arXiv:2212.11377  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

    Authors: Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

    Abstract: Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subjects and study Generalized Speech Enhancement, where the goal is not to reconstruct the exact reference clean signal, but to focus on improving certain aspects of… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  8. arXiv:2207.10141  [pdf, other

    cs.SD cs.CV eess.AS

    AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation

    Authors: Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

    Abstract: We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify several limitations of previous work on audio-visual on-screen sound separation, including the coarse resolution of spatio-temporal attention, poor convergence o… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  9. arXiv:2111.10139  [pdf, other

    cs.CV cs.CL

    More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech

    Authors: Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez

    Abstract: In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. Motivated by dubbing, VDTTS takes advantage of video frames as an additional input alongside text, and generates speech that matches the video signal. We demonstrate how this allows VDTTS to, unlike plain TTS models, generate speech that not only has prosodic variations like natural pauses and pitch, but is also synchronized t… ▽ More

    Submitted 23 March, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

  10. arXiv:2107.08661  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Translatotron 2: High-quality direct speech-to-speech translation with voice preservation

    Authors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz

    Abstract: We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a linguistic decoder, an acoustic synthesizer, and a single attention module that connects them together. Experimental results on three datasets consistently show that Translatotron 2 outperforms the original Translatotron by a large margin on… ▽ More

    Submitted 17 May, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: ICML 2022

  11. arXiv:2106.09669  [pdf, other

    cs.SD cs.CV cs.LG

    Improving On-Screen Sound Separation for Open-Domain Videos with Audio-Visual Self-Attention

    Authors: Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

    Abstract: We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audio-visual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio s… ▽ More

    Submitted 14 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2011.01143  [pdf, other

    cs.SD cs.CV eess.AS

    Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

    Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

    Abstract: Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Pri… ▽ More

    Submitted 29 May, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: ICLR 2021, 27 pages

  13. Class-Aware Fully-Convolutional Gaussian and Poisson Denoising

    Authors: Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein

    Abstract: We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (b… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

  14. arXiv:1803.06414  [pdf, other

    cs.CV

    Learning to Segment via Cut-and-Paste

    Authors: Tal Remez, Jonathan Huang, Matthew Brown

    Abstract: This paper presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in an adversarial learning setup. A mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location. Th… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

  15. arXiv:1707.08991  [pdf, other

    cs.CV

    Efficient Deformable Shape Correspondence via Kernel Matching

    Authors: Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg, Tal Remez, Emanuele Rodolà, Alex Bronstein, Michael Bronstein, Ron Kimmel, Daniel Cremers

    Abstract: We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the map**, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of t… ▽ More

    Submitted 15 September, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: Accepted for oral presentation at 3DV 2017, including supplementary material

  16. arXiv:1704.08686  [pdf, other

    cs.CV

    Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

    Authors: Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, Michael M. Bronstein

    Abstract: We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a par… ▽ More

    Submitted 30 July, 2017; v1 submitted 27 April, 2017; originally announced April 2017.

    Comments: Accepted for publication at ICCV 2017

  17. arXiv:1701.01698  [pdf, other

    cs.CV

    Deep Class Aware Denoising

    Authors: Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein

    Abstract: The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how… ▽ More

    Submitted 27 February, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

  18. arXiv:1701.01687  [pdf, other

    cs.CV

    Deep Convolutional Denoising of Low-Light Images

    Authors: Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein

    Abstract: Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for.… ▽ More

    Submitted 6 January, 2017; originally announced January 2017.

  19. arXiv:1612.04956  [pdf, other

    cs.CV cs.GR

    Cloud Dictionary: Sparse Coding and Modeling for Point Clouds

    Authors: Or Litany, Tal Remez, Alex Bronstein

    Abstract: With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving. Parsimony-based algorithms have shown great success on images and videos where data points are sampled on a regular Cartesian grid. We propose an adaptation of these tec… ▽ More

    Submitted 20 March, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

    Comments: Signal Processing with Adaptive Sparse Structured Representations (SPARS), 2017

  20. arXiv:1608.01074  [pdf, other

    cs.CV

    FPGA system for real-time computational extended depth of field imaging using phase aperture coding

    Authors: Tal Remez, Or Litany, Shachar Yoseff, Harel Haim, Alex Bronstein

    Abstract: We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging. The acquisition is performed through a phase-coded aperture implemented by placing a thin wavelength-dependent optical mask inside the pupil of a conventional camera lens, as a result of which, each color channel is focused at a different depth. The reconstruction process receives the raw Baye… ▽ More

    Submitted 3 August, 2016; originally announced August 2016.

  21. arXiv:1512.01774  [pdf, other

    cs.CV

    Image reconstruction from dense binary pixels

    Authors: Or Litany, Tal Remez, Alex Bronstein

    Abstract: Recently, the dense binary pixel Gigavision camera had been introduced, emulating a digital version of the photographic film. While seems to be a promising solution for HDR imaging, its output is not directly usable and requires an image reconstruction process. In this work, we formulate this problem as the minimization of a convex objective combining a maximum-likelihood term with a sparse synthe… ▽ More

    Submitted 6 December, 2015; originally announced December 2015.

    Comments: Signal Processing with Adaptive Sparse Structured Representations (SPARS 2015)

  22. arXiv:1512.01515  [pdf, other

    cs.CV

    ASIST: Automatic Semantically Invariant Scene Transformation

    Authors: Or Litany, Tal Remez, Daniel Freedman, Lior Shapira, Alex Bronstein, Ran Gal

    Abstract: We present ASIST, a technique for transforming point clouds by replacing objects with their semantically equivalent counterparts. Transformations of this kind have applications in virtual reality, repair of fused scans, and robotics. ASIST is based on a unified formulation of semantic labeling and object replacement; both result from minimizing a single objective. We present numerical tools for th… ▽ More

    Submitted 4 December, 2015; originally announced December 2015.

  23. arXiv:1511.02911  [pdf, other

    cs.CV

    Spatially Coherent Random Forests

    Authors: Tal Remez, Shai Avidan

    Abstract: Spatially Coherent Random Forest (SCRF) extends Random Forest to create spatially coherent labeling. Each split function in SCRF is evaluated based on a traditional information gain measure that is regularized by a spatial coherency term. This way, SCRF is encouraged to choose split functions that cluster pixels both in appearance space and in image space. In particular, we use SCRF to detect cont… ▽ More

    Submitted 5 December, 2015; v1 submitted 9 November, 2015; originally announced November 2015.

  24. arXiv:1510.04601  [pdf, other

    cs.CV

    A Picture is Worth a Billion Bits: Real-Time Image Reconstruction from Dense Binary Pixels

    Authors: Tal Remez, Or Litany, Alex Bronstein

    Abstract: The pursuit of smaller pixel sizes at ever increasing resolution in digital image sensors is mainly driven by the stringent price and form-factor requirements of sensors and optics in the cellular phone market. Recently, Eric Fossum proposed a novel concept of an image sensor with dense sub-diffraction limit one-bit pixels jots, which can be considered a digital emulation of silver halide photogra… ▽ More

    Submitted 5 December, 2015; v1 submitted 15 October, 2015; originally announced October 2015.