Skip to main content

Showing 1–5 of 5 results for author: Paiss, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12945  [pdf, other

    cs.CV

    Lumiere: A Space-Time Diffusion Model for Video Generation

    Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

    Abstract: We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synth… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc

  2. arXiv:2302.12066  [pdf, other

    cs.CV

    Teaching CLIP to Count to Ten

    Authors: Roni Paiss, Ariel Ephrat, Omer Tov, Shiran Zada, Inbar Mosseri, Michal Irani, Tali Dekel

    Abstract: Large vision-language models (VLMs), such as CLIP, learn rich joint image-text representations, facilitating advances in numerous downstream tasks, including zero-shot classification and text-to-image generation. Nevertheless, existing VLMs exhibit a prominent well-documented limitation - they fail to encapsulate compositional concepts such as counting. We introduce a simple yet effective method t… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  3. arXiv:2206.04817  [pdf, other

    cs.LG math.OC

    The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

    Authors: Vimal Thilak, Etai Littwin, Shuangfei Zhai, Omid Saremi, Roni Paiss, Joshua Susskind

    Abstract: The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 ) refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking via a series of empirical studies. Specifically, we uncover an optimization anomaly plaguing adaptive optimizers at extremely late stag… ▽ More

    Submitted 13 June, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Removed Tex formatting commands in title Title and Abstract

  4. arXiv:2204.04908  [pdf, other

    cs.CV

    No Token Left Behind: Explainability-Aided Image Classification and Generation

    Authors: Roni Paiss, Hila Chefer, Lior Wolf

    Abstract: The application of zero-shot learning in computer vision has been revolutionized by the use of image-text matching models. The most notable example, CLIP, has been widely used for both zero-shot classification and guiding generative models with a text prompt. However, the zero-shot use of CLIP is unstable with respect to the phrasing of the input text, making it necessary to carefully engineer the… ▽ More

    Submitted 6 August, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  5. arXiv:2110.12427  [pdf, other

    cs.CV

    Image-Based CLIP-Guided Essence Transfer

    Authors: Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf

    Abstract: We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target. Crucially, the semantic attributes that constitute the essence of an image may differ from image to image. Our blending operator combin… ▽ More

    Submitted 11 October, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: To appear in ECCV'22