Skip to main content

Showing 1–25 of 25 results for author: Taigman, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.09334  [pdf, other

    cs.CV

    Video Editing via Factorized Diffusion Distillation

    Authors: Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman

    Abstract: We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure,… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  2. arXiv:2311.10089  [pdf, other

    cs.CV cs.AI cs.LG

    Emu Edit: Precise Image Editing via Recognition and Generation Tasks

    Authors: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman

    Abstract: Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editin… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  3. arXiv:2309.02591  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

    Authors: Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz , et al. (2 additional authors not shown)

    Abstract: We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted fr… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  4. arXiv:2301.11280  [pdf, other

    cs.CV cs.AI cs.LG

    Text-To-4D Dynamic Scene Generation

    Authors: Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

    Abstract: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera locat… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  5. SpaText: Spatio-Textual Representation for Controllable Image Generation

    Authors: Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin

    Abstract: Recent text-to-image diffusion models are able to generate convincing results of unprecedented quality. However, it is nearly impossible to control the shapes of different regions/objects or their layout in a fine-grained fashion. Previous attempts to provide such controls were hindered by their reliance on a fixed set of labels. To this end, we present SpaText - a new method for text-to-image gen… ▽ More

    Submitted 19 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page available at: https://omriavrahami.com/spatext

  6. arXiv:2211.01223  [pdf, other

    cs.SD eess.AS

    Audio Language Modeling using Perceptually-Guided Discrete Representations

    Authors: Felix Kreuk, Yaniv Taigman, Adam Polyak, Jade Copet, Gabriel Synnaeve, Alexandre Défossez, Yossi Adi

    Abstract: In this work, we study the task of Audio Language Modeling, in which we aim at learning probabilistic models for audio that can be used for generation and completion. We use a state-of-the-art perceptually-guided audio compression model, to encode audio to discrete representations. Next, we train a transformer-based causal language model using these representations. At inference time, we perform a… ▽ More

    Submitted 4 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

  7. arXiv:2209.15352  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    AudioGen: Textually Guided Audio Generation

    Authors: Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

    Abstract: We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs. AudioGen operates on a learnt discrete audio representation. The task of text-to-audio generation poses multiple challenges. Due to the way audio travels through a medium, differe… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted to ICLR 2023

  8. arXiv:2209.14792  [pdf, other

    cs.CV cs.AI cs.LG

    Make-A-Video: Text-to-Video Generation without Text-Video Data

    Authors: Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

    Abstract: We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  9. arXiv:2204.02849  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    KNN-Diffusion: Image Generation via Large-Scale Retrieval

    Authors: Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman

    Abstract: Recent text-to-image models have achieved impressive results. However, since they require large-scale datasets of text-image pairs, it is impractical to train them on new domains where data is scarce or not labeled. In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small an… ▽ More

    Submitted 2 October, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  10. arXiv:2203.13131  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

    Authors: Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman

    Abstract: Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains. While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mech… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  11. arXiv:2102.00429  [pdf, other

    cs.SD cs.LG eess.AS

    High Fidelity Speech Regeneration with Application to Speech Enhancement

    Authors: Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

    Abstract: Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio. To enhance speech beyond the limitations of the original signal, we take a regeneration approach, in which we recreate the speech from its essence, including the semi-recognized speech, p… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  12. arXiv:2008.02830  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Cross-Domain Singing Voice Conversion

    Authors: Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman

    Abstract: We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator. The proposed generative architecture is invariant to the speaker's identity and can be trained to generate target singers fr… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  13. arXiv:1911.08348  [pdf, other

    cs.LG cs.CV cs.GR stat.ML

    Live Face De-Identification in Video

    Authors: Oran Gafni, Lior Wolf, Yaniv Taigman

    Abstract: We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination and expression) fixed. We achieve this by a novel feed-forward encoder-decoder network architecture that is conditioned on the high-level representation of a person's facial image. The… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: ICCV 2019

    Journal ref: Proceedings of the IEEE International Conference on Computer Vision (2019) 9378--9387

  14. arXiv:1904.08983  [pdf, other

    cs.SD cs.LG stat.ML

    TTS Skins: Speaker Conversion via ASR

    Authors: Adam Polyak, Lior Wolf, Yaniv Taigman

    Abstract: We present a fully convolutional wav-to-wav network for converting between speakers' voices, without relying on text. Our network is based on an encoder-decoder architecture, where the encoder is pre-trained for the task of Automatic Speech Recognition, and a multi-speaker waveform decoder is trained to reconstruct the original signal in an autoregressive manner. We train the network on narrated a… ▽ More

    Submitted 26 July, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

  15. arXiv:1904.08379  [pdf, other

    cs.LG cs.CV cs.GR stat.ML

    Vid2Game: Controllable Characters Extracted from Real-World Videos

    Authors: Oran Gafni, Lior Wolf, Yaniv Taigman

    Abstract: We are given a video of a person performing a certain activity, from which we extract a controllable model. The model generates novel image sequences of that person, according to arbitrary user-defined control signals, typically marking the displacement of the moving body. The generated video can have an arbitrary background, and effectively capture both the dynamics and appearance of the person.… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  16. arXiv:1807.11074  [pdf, other

    cs.LG stat.ML

    Visual Analogies between Atari Games for Studying Transfer Learning in RL

    Authors: Doron Sobol, Lior Wolf, Yaniv Taigman

    Abstract: In this work, we ask the following question: Can visual analogies, learned in an unsupervised way, be used in order to transfer knowledge between pairs of games and even play one game using an agent trained for another game? We attempt to answer this research question by creating visual analogies between a pair of games: a source game and a target game. For example, given a video frame in the targ… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

  17. arXiv:1805.07848  [pdf, other

    cs.SD cs.AI cs.LG stat.ML

    A Universal Music Translation Network

    Authors: Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman

    Abstract: We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not… ▽ More

    Submitted 23 May, 2018; v1 submitted 20 May, 2018; originally announced May 2018.

  18. arXiv:1802.06984  [pdf, other

    cs.LG cs.SD eess.AS

    Fitting New Speakers Based on a Short Untranscribed Sample

    Authors: Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf

    Abstract: Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, place… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  19. arXiv:1707.06588  [pdf, other

    cs.LG cs.CL cs.SD

    VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

    Authors: Yaniv Taigman, Lior Wolf, Adam Polyak, Eliya Nachmani

    Abstract: We present a new neural text to speech (TTS) method that is able to transform text to speech in voices that are sampled in the wild. Unlike other systems, our solution is able to deal with unconstrained voice samples and without requiring aligned phonemes or linguistic features. The network architecture is simpler than those in the existing literature and is based on a novel shifting buffer workin… ▽ More

    Submitted 1 February, 2018; v1 submitted 20 July, 2017; originally announced July 2017.

  20. arXiv:1704.05693  [pdf, other

    cs.CV cs.LG

    Unsupervised Creation of Parameterized Avatars

    Authors: Lior Wolf, Yaniv Taigman, Adam Polyak

    Abstract: We study the problem of map** an input image to a tied pair consisting of a vector of parameters and an image that is created using a graphical engine from the vector of parameters. The map**'s objective is to have the output image as similar as possible to the input image. During training, no supervision is given in the form of matching inputs and outputs. This learning problem extends two… ▽ More

    Submitted 9 July, 2017; v1 submitted 19 April, 2017; originally announced April 2017.

    Comments: v2 -- a change in the references due to a request from authors

  21. arXiv:1611.02200  [pdf, other

    cs.CV

    Unsupervised Cross-Domain Image Generation

    Authors: Yaniv Taigman, Adam Polyak, Lior Wolf

    Abstract: We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised… ▽ More

    Submitted 7 November, 2016; originally announced November 2016.

  22. arXiv:1501.05703  [pdf, other

    cs.CV

    Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

    Authors: Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, Lubomir Bourdev

    Abstract: We explore the task of recognizing peoples' identities in photo albums in an unconstrained setting. To facilitate this, we introduce the new People In Photo Albums (PIPA) dataset, consisting of over 60000 instances of 2000 individuals collected from public Flickr photo albums. With only about half of the person images containing a frontal face, the recognition task is very challenging due to the l… ▽ More

    Submitted 30 January, 2015; v1 submitted 22 January, 2015; originally announced January 2015.

  23. arXiv:1406.5266  [pdf, other

    cs.CV

    Web-Scale Training for Face Identification

    Authors: Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

    Abstract: Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. We study face recognition and show that three distinct properties have surprising effects on the transferability of deep convolutional networks (CNN): (1) The bottleneck of the network serves as an important transfer learni… ▽ More

    Submitted 18 April, 2015; v1 submitted 19 June, 2014; originally announced June 2014.

  24. arXiv:1312.5853  [pdf, other

    cs.LG cs.NE

    Multi-GPU Training of ConvNets

    Authors: Omry Yadan, Keith Adams, Yaniv Taigman, Marc'Aurelio Ranzato

    Abstract: In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs.

    Submitted 18 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Machine Learning, Deep Learning, Convolutional Networks, Computer Vision, GPU, CUDA

  25. arXiv:1108.1122  [pdf, other

    cs.CV

    Leveraging Billions of Faces to Overcome Performance Barriers in Unconstrained Face Recognition

    Authors: Yaniv Taigman, Lior Wolf

    Abstract: We employ the face recognition technology developed in house at face.com to a well accepted benchmark and show that without any tuning we are able to considerably surpass state of the art results. Much of the improvement is concentrated in the high-valued performance point of zero false positive matches, where the obtained recall rate almost doubles the best reported result to date. We discuss the… ▽ More

    Submitted 4 August, 2011; originally announced August 2011.

    Comments: 7 pages