Skip to main content

Showing 1–32 of 32 results for author: Nachmani, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02133  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SimulTron: On-Device Simultaneous Speech to Speech Translation

    Authors: Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

    Abstract: Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2306.05167  [pdf, other

    cs.LG

    Decision S4: Efficient Sequence-Based RL via State Spaces Layers

    Authors: Shmuel Bar-David, Itamar Zimerman, Eliya Nachmani, Lior Wolf

    Abstract: Recently, sequence learning methods have been applied to the problem of off-policy Reinforcement Learning, including the seminal work on Decision Transformers, which employs transformers for this task. Since transformers are parameter-heavy, cannot benefit from history longer than a fixed window size, and are not computed using recurrence, we set out to investigate the suitability of the S4 family… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 21 pages,13 figures

    MSC Class: 14J60 ACM Class: F.2.2; I.2.7

  3. arXiv:2305.17547  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Translatotron 3: Speech to Speech Translation with Monolingual Data

    Authors: Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayuth Asawaroengchai, Heiga Zen, Michelle Tadmor Ramanovich

    Abstract: This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-speech translation from monolingual speech-text datasets by combining masked autoencoder, unsupervised embedding map**, and back-translation. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system, reporting… ▽ More

    Submitted 16 January, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: To appear in ICASSP 2024

  4. arXiv:2305.15255  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM

    Authors: Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, RJ Skerry-Ryan, Michelle Tadmor Ramanovich

    Abstract: We present Spectron, a novel approach to adapting pre-trained large language models (LLMs) to perform spoken question answering (QA) and speech continuation. By endowing the LLM with a pre-trained speech encoder, our model becomes able to take speech inputs and generate speech outputs. The entire system is trained end-to-end and operates directly on spectrograms, simplifying our architecture. Key… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024 camera-ready

  5. arXiv:2301.10752  [pdf, other

    eess.AS cs.AI

    Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

    Authors: Shahar Lutati, Eliya Nachmani, Lior Wolf

    Abstract: The problem of speech separation, also known as the cocktail party problem, refers to the task of isolating a single speech signal from a mixture of speech signals. Previous work on source separation derived an upper bound for the source separation task in the domain of human speech. This bound is derived for deterministic models. Recent advancements in generative models challenge this bound. We s… ▽ More

    Submitted 24 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  6. arXiv:2206.02246  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

    Authors: Alon Levkovitch, Eliya Nachmani, Lior Wolf

    Abstract: We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (~3 seconds) sample from the target person, and generation is steered at inference time, without any training steps. At the heart of the method lies a sampling process that combines the estimation of the denoising… ▽ More

    Submitted 22 June, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  7. arXiv:2206.00786  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Neural Decoding with Optimization of Node Activations

    Authors: Eliya Nachmani, Yair Be'ery

    Abstract: The problem of maximum likelihood decoding with a neural decoder for error-correcting code is considered. It is shown that the neural decoder can be improved with two novel loss terms on the node's activations. The first loss term imposes a sparse constraint on the node's activations. Whereas, the second loss term tried to mimic the node's activations from a teacher decoder which has better perfor… ▽ More

    Submitted 11 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: IEEE Communications Letters

  8. arXiv:2205.11801  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    SepIt: Approaching a Single Channel Speech Separation Bound

    Authors: Shahar Lutati, Eliya Nachmani, Lior Wolf

    Abstract: We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made significant progress for a few speakers, there is room for improvement for five and ten speakers. We then introduce a Deep neural network, SepIt, that iteratively impro… ▽ More

    Submitted 21 May, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to INTERSPEECH 2022

  9. arXiv:2204.02849  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    KNN-Diffusion: Image Generation via Large-Scale Retrieval

    Authors: Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman

    Abstract: Recent text-to-image models have achieved impressive results. However, since they require large-scale datasets of text-image pairs, it is impractical to train them on new domains where data is scarce or not labeled. In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small an… ▽ More

    Submitted 2 October, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  10. arXiv:2112.00390  [pdf, other

    cs.CV cs.AI cs.LG

    SegDiff: Image Segmentation with Diffusion Probabilistic Models

    Authors: Tomer Amit, Tal Shaharbany, Eliya Nachmani, Lior Wolf

    Abstract: Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders. Additional… ▽ More

    Submitted 7 September, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

  11. arXiv:2111.12986  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

    Authors: Or Goren, Eliya Nachmani, Lior Wolf

    Abstract: We present a method for the generation of Midi files of piano music. The method models the right and left hands using two networks, where the left hand is conditioned on the right hand. This way, the melody is generated before the harmony. The Midi is represented in a way that is invariant to the musical scale, and the melody is represented, for the purpose of conditioning the harmony, by the cont… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted for publication at MMM 2022

  12. arXiv:2111.01471  [pdf, other

    cs.CL cs.LG

    Zero-Shot Translation using Diffusion Models

    Authors: Eliya Nachmani, Shaked Dovrat

    Abstract: In this work, we show a novel method for neural machine translation (NMT), using a denoising diffusion probabilistic model (DDPM), adjusted for textual data, following recent advances in the field. We show that it's possible to translate sentences non-autoregressively using a diffusion model conditioned on the source sentence. We also show that our model is able to translate between pairs of langu… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: preprint

  13. arXiv:2110.05948  [pdf, other

    eess.SP cs.AI cs.CV cs.GR cs.LG cs.SD eess.AS eess.IV

    Denoising Diffusion Gamma Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582

  14. arXiv:2106.07582  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Non Gaussian Denoising Diffusion Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion pro… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  15. arXiv:2106.04876  [pdf, other

    cs.CR cs.IT cs.LG

    Recovering AES Keys with a Deep Cold Boot Attack

    Authors: Itamar Zimerman, Eliya Nachmani, Lior Wolf

    Abstract: Cold boot attacks inspect the corrupted random access memory soon after the power has been shut down. While most of the bits have been corrupted, many bits, at random locations, have not. Since the keys in many encryption schemes are being expanded in memory into longer keys with fixed redundancies, the keys can often be restored. In this work, we combine a novel cryptographic variant of a deep er… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted to ICML 2021

  16. arXiv:2104.08955  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

    Authors: Shaked Dovrat, Eliya Nachmani, Lior Wolf

    Abstract: Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train wit… ▽ More

    Submitted 7 November, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021, Data creation link added

  17. arXiv:2104.02600  [pdf, other

    cs.LG cs.CV

    Noise Estimation for Generative Diffusion Models

    Authors: Robin San-Roman, Eliya Nachmani, Lior Wolf

    Abstract: Generative diffusion models have emerged as leading models in speech and image generation. However, in order to perform well with a small number of denoising steps, a costly tuning of the set of noise parameters is needed. In this work, we present a simple and versatile learning scheme that can step-by-step adjust those noise parameters, for any given number of steps, while the previous work needs… ▽ More

    Submitted 12 September, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

  18. arXiv:2103.11780  [pdf, other

    cs.IT cs.LG

    Autoregressive Belief Propagation for Decoding Block Codes

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: We revisit recent methods that employ graph neural networks for decoding error correcting codes and employ messages that are computed in an autoregressive manner. The outgoing messages of the variable nodes are conditioned not only on the incoming messages, but also on an estimation of the SNR and on the inferred codeword and on two downstream computations: (i) an extended vector of parity check o… ▽ More

    Submitted 23 January, 2021; originally announced March 2021.

  19. arXiv:2011.02329  [pdf, other

    cs.SD cs.LG eess.AS

    Single channel voice separation for unknown number of speakers under reverberant and noisy settings

    Authors: Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi

    Abstract: We present a unified network for voice separation of an unknown number of speakers. The proposed approach is composed of several separation heads optimized together with a speaker classification branch. The separation is carried out in the time domain, together with parameter sharing between all separation heads. The classification branch estimates the number of speakers while each head is special… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

  20. SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

    Authors: Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

    Abstract: Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated bin… ▽ More

    Submitted 14 November, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: 5 pages, accepted by IEEE Signal Processing Letters

  21. arXiv:2003.01531  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Voice Separation with an Unknown Number of Multiple Speakers

    Authors: Eliya Nachmani, Yossi Adi, Lior Wolf

    Abstract: We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speake… ▽ More

    Submitted 1 September, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted to ICML 2020. For associated audio samples, see http://enk100.github.io/speaker_separation

  22. arXiv:2002.00240  [pdf, other

    cs.LG stat.ML

    Molecule Property Prediction and Classification with Graph Hypernetworks

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: Graph neural networks are currently leading the performance charts in learning-based molecule property prediction and classification. Computational chemistry has, therefore, become the a prominent testbed for generic graph neural networks, as well as for specialized message passing methods. In this work, we demonstrate that the replacement of the underlying networks with hypernetworks leads to a b… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

  23. arXiv:1911.03229  [pdf, other

    cs.IT cs.LG

    A Gated Hypernet Decoder for Polar Codes

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: Hypernetworks were recently shown to improve the performance of message passing algorithms for decoding error correcting codes. In this work, we demonstrate how hypernetworks can be applied to decode polar codes by employing a new formalization of the polar belief propagation decoding scheme. We demonstrate that our method improves the previous results of neural polar decoders and achieves, for la… ▽ More

    Submitted 10 February, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted to ICASSP 2020

  24. arXiv:1909.09036  [pdf, other

    cs.IT cs.LG stat.ML

    Hyper-Graph-Network Decoders for Block Codes

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: Neural decoders were shown to outperform classical message passing techniques for short BCH codes. In this work, we extend these results to much larger families of algebraic block codes, by performing message passing with graph neural networks. The parameters of the sub-network at each variable-node in the Tanner graph are obtained from a hypernetwork that receives the absolute values of the curre… ▽ More

    Submitted 25 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted to NeurIPS 2019. Camera Ready

  25. arXiv:1904.06590  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Unsupervised Singing Voice Conversion

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers. The proposed network employs a single CNN e… ▽ More

    Submitted 25 September, 2019; v1 submitted 13 April, 2019; originally announced April 2019.

    Comments: Accepted to Interspeech 2019

  26. arXiv:1902.02263  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Unsupervised Polyglot Text To Speech

    Authors: Eliya Nachmani, Lior Wolf

    Abstract: We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. Training is done without using matching or parallel data, i.e., without samples of the same speaker in multiple languages, making the method much more applicable. The conve… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: The paper will be presented at ICASSP 2019

  27. arXiv:1802.06984  [pdf, other

    cs.LG cs.SD eess.AS

    Fitting New Speakers Based on a Short Untranscribed Sample

    Authors: Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf

    Abstract: Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, place… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  28. arXiv:1801.02726  [pdf, other

    cs.IT cs.NE

    Near Maximum Likelihood Decoding with Deep Learning

    Authors: Eliya Nachmani, Yaron Bachar, Elad Marciano, David Burshtein, Yair Be'ery

    Abstract: A novel and efficient neural decoder algorithm is proposed. The proposed decoder is based on the neural Belief Propagation algorithm and the Automorphism Group. By combining neural belief propagation with permutations from the Automorphism Group we achieve near maximum likelihood performance for High Density Parity Check codes. Moreover, the proposed decoder significantly improves the decoding com… ▽ More

    Submitted 8 January, 2018; originally announced January 2018.

    Comments: The paper will be presented at IZS 2018

  29. arXiv:1707.06588  [pdf, other

    cs.LG cs.CL cs.SD

    VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

    Authors: Yaniv Taigman, Lior Wolf, Adam Polyak, Eliya Nachmani

    Abstract: We present a new neural text to speech (TTS) method that is able to transform text to speech in voices that are sampled in the wild. Unlike other systems, our solution is able to deal with unconstrained voice samples and without requiring aligned phonemes or linguistic features. The network architecture is simpler than those in the existing literature and is based on a novel shifting buffer workin… ▽ More

    Submitted 1 February, 2018; v1 submitted 20 July, 2017; originally announced July 2017.

  30. Deep Learning Methods for Improved Decoding of Linear Codes

    Authors: Eliya Nachmani, Elad Marciano, Loren Lugosch, Warren J. Gross, David Burshtein, Yair Beery

    Abstract: The problem of low complexity, close to optimal, channel decoding of linear codes with short to moderate block length is considered. It is shown that deep learning methods can be used to improve a standard belief propagation decoder, despite the large example space. Similar improvements are obtained for the min-sum algorithm. It is also shown that tying the parameters of the decoders across iterat… ▽ More

    Submitted 1 January, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: Accepted To IEEE Journal Of Selected Topics In Signal Processing

  31. arXiv:1702.07560  [pdf, other

    cs.IT cs.LG cs.NE

    RNN Decoding of Linear Block Codes

    Authors: Eliya Nachmani, Elad Marciano, David Burshtein, Yair Be'ery

    Abstract: Designing a practical, low complexity, close to optimal, channel decoder for powerful algebraic codes with short to moderate block length is an open research problem. Recently it has been shown that a feed-forward neural network architecture can improve on standard belief propagation decoding, despite the large example space. In this paper we introduce a recurrent neural network architecture for d… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

  32. arXiv:1607.04793  [pdf, other

    cs.IT cs.LG cs.NE

    Learning to Decode Linear Codes Using Deep Learning

    Authors: Eliya Nachmani, Yair Beery, David Burshtein

    Abstract: A novel deep learning method for improving the belief propagation algorithm is proposed. The method generalizes the standard belief propagation algorithm by assigning weights to the edges of the Tanner graph. These edges are then trained using deep learning techniques. A well-known property of the belief propagation algorithm is the independence of the performance on the transmitted codeword. A cr… ▽ More

    Submitted 30 September, 2016; v1 submitted 16 July, 2016; originally announced July 2016.

    Comments: Presented at the Allerton Conference 2016