Skip to main content

Showing 1–50 of 155 results for author: Cohen-Or, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07706  [pdf, other

    cs.CV

    Object-level Scene Deocclusion

    Authors: Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu

    Abstract: Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which pr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024. A foundation model for category-agnostic object deocclusion

  2. arXiv:2406.06508  [pdf, other

    cs.CV cs.AI cs.GR

    Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer

    Authors: Sigal Raab, Inbar Gat, Nathan Sala, Guy Tevet, Rotem Shalev-Arkushin, Ohad Fried, Amit H. Bermano, Daniel Cohen-Or

    Abstract: Given the remarkable results of motion synthesis with diffusion models, a natural question arises: how can we effectively leverage these models for motion editing? Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models, which enables manipulating the latent feature space; hence, they primarily center on handlin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Video: https://www.youtube.com/watch?v=s5oo3sKV0YU, Project page: https://monkeyseedocg.github.io, Code: https://github.com/MonkeySeeDoCG/MoMo-code

  3. arXiv:2406.05404  [pdf, other

    cs.CV cs.GR

    Layered Image Vectorization via Semantic Simplification

    Authors: Zhenyu Wang, Jianxi Huang, Zhida Sun, Daniel Cohen-Or, Min Lu

    Abstract: This work presents a novel progressive image vectorization technique aimed at generating layered vectors that represent the original image from coarse to fine detail levels. Our approach introduces semantic simplification, which combines Score Distillation Sampling and semantic segmentation to iteratively simplify the input image. Subsequently, our method optimizes the vector layers for each of th… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  4. arXiv:2406.05261  [pdf, other

    cs.CV cs.GR

    Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning

    Authors: Yilin Liu, Jiale Chen, Shanshan Pan, Daniel Cohen-Or, Hao Zhang, Hui Huang

    Abstract: We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models which involves a two-step process: it first applies a spatial partitioning, referred to as the ``split``, followed by a ``fit`` operation to derive a single primitive within each partition. Specifically, our partitioning aims to produce the classical Voronoi diagram of the set of ground-truth (GT) B-Rep pr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACM Transactions on Graphics (SIGGRAPH 2024); Project page: https://vcc.tech/research/2024/BRepVP; Code: https://github.com/yilinliu77/NVDNet

  5. arXiv:2406.04008  [pdf, other

    cs.GR

    A Versatile Collage Visualization Technique

    Authors: Zhenyu Wang, Daniel Cohen-Or, Min Lu

    Abstract: Collage techniques are commonly used in visualization to organize a collection of geometric shapes, facilitating the representation of visual features holistically, as seen in word clouds or circular packing diagrams. Typically, packing methods rely on object-space optimization techniques, which often necessitate customizing the optimization process to suit the complexity of geometric primitives a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  6. arXiv:2406.01300  [pdf, other

    cs.CV

    pOps: Photo-Inspired Diffusion Operators

    Authors: Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or

    Abstract: Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://popspaper.github.io/pOps/

  7. arXiv:2405.12661  [pdf, other

    cs.CV

    EmoEdit: Evoking Emotions through Image Manipulation

    Authors: **gyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psych… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2404.12382  [pdf, other

    cs.CV cs.AI cs.GR

    Lazy Diffusion Transformer for Interactive Image Editing

    Authors: Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi

    Abstract: We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the curr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2404.11614  [pdf, other

    cs.CV

    Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

    Authors: Zichen Liu, Yihao Meng, Hao Ouyang, Yue Yu, Bolin Zhao, Daniel Cohen-Or, Huamin Qu

    Abstract: Text animation serves as an expressive medium, transforming static communication into dynamic experiences by infusing words with motion to evoke emotions, emphasize meanings, and construct compelling narratives. Crafting animations that are semantically aware poses significant challenges, demanding expertise in graphic design and animation. We present an automated text animation scheme, termed "Dy… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Our demo page is available at: https://animate-your-word.github.io/demo/

  10. arXiv:2404.03620  [pdf, other

    cs.CV cs.GR

    LCM-Lookahead for Encoder-based Text-to-Image Personalization

    Authors: Rinon Gal, Or Lichter, Elad Richardson, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or

    Abstract: Recent advancements in diffusion models have introduced fast sampling methods that can effectively produce high-quality images in just one or a few denoising steps. Interestingly, when these are distilled from existing diffusion models, they often maintain alignment with the original model, retaining similar outputs for similar prompts and seeds. These properties present opportunities to leverage… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page at https://lcm-lookahead.github.io/

  11. arXiv:2403.16990  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

    Authors: Omer Dahary, Or Patashnik, Kfir Aberman, Daniel Cohen-Or

    Abstract: Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Recently, numerous layout-to-image extensions have been introduced to improve user control, aiming to localize subjects represented by specific tokens. Yet, these… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://omer11a.github.io/bounded-attention/

  12. arXiv:2403.14602  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    ReNoise: Real Image Inversion Through Iterative Noising

    Authors: Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Daniel Cohen-Or

    Abstract: Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps.… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: project page at: https://garibida.github.io/ReNoise-Inversion/

  13. arXiv:2403.14599  [pdf, other

    cs.CV

    MyVLM: Personalizing VLMs for User-Specific Queries

    Authors: Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or

    Abstract: Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In this work, we take a first step toward the personalization of VLMs, enabling them to learn and reason over user-provided concepts. For example, we explore whether… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://snap-research.github.io/MyVLM/

  14. arXiv:2403.14572  [pdf, other

    cs.CV

    Implicit Style-Content Separation using B-LoRA

    Authors: Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or

    Abstract: Image stylization involves manipulating the visual appearance and texture (style) of an image while preserving its underlying objects, structures, and concepts (content). The separation of style and content is essential for manipulating the image's style independently from its content, ensuring a harmonious and visually pleasing result. Achieving this separation requires a deep understanding of bo… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  15. arXiv:2402.14792  [pdf, other

    cs.CV cs.GR cs.LG

    Consolidating Attention Features for Multi-view Image Editing

    Authors: Or Patashnik, Rinon Gal, Daniel Cohen-Or, Jun-Yan Zhu, Fernando De la Torre

    Abstract: Large-scale text-to-image models enable a wide range of image editing techniques, using text prompts or even spatial controls. However, applying these editing methods to multi-view images depicting a single scene leads to 3D-inconsistent results. In this work, we focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views. W… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Project Page at https://qnerf-consolidation.github.io/qnerf-consolidation/

  16. arXiv:2401.13245  [pdf, other

    cs.HC

    GraphiMind: LLM-centric Interface for Information Graphics Design

    Authors: Qirui Huang, Min Lu, Joel Lanir, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seam… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  17. arXiv:2401.06105  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    PALP: Prompt Aligned Personalization of Text-to-Image Models

    Authors: Moab Arar, Andrey Voynov, Amir Hertz, Omri Avrahami, Shlomi Fruchter, Yael Pritch, Daniel Cohen-Or, Ariel Shamir

    Abstract: Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impe… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Project page available at https://prompt-aligned.github.io/

  18. arXiv:2401.02847  [pdf, other

    cs.CV cs.GR cs.LG

    Generating Non-Stationary Textures using Self-Rectification

    Authors: Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while fa… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Project page: https://github.com/xiaorongjun000/Self-Rectification

  19. arXiv:2312.11557  [pdf, other

    cs.CV

    SAI3D: Segment Any Instance in 3D Scenes

    Authors: Yingda Yin, Yuzheng Liu, Yang Xiao, Daniel Cohen-Or, **gwei Huang, Baoquan Chen

    Abstract: Advancements in 3D instance segmentation have traditionally been tethered to the availability of annotated datasets, limiting their application to a narrow spectrum of object categories. Recent efforts have sought to harness vision-language models like CLIP for open-set semantic reasoning, yet these methods struggle to distinguish between objects of the same categories and rely on specific prompts… ▽ More

    Submitted 24 March, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  20. arXiv:2312.03766  [pdf, other

    cs.CL cs.CV

    Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

    Authors: Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

    Abstract: While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  21. arXiv:2312.02133  [pdf, other

    cs.CV cs.GR cs.LG

    Style Aligned Image Generation via Shared Attention

    Authors: Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or

    Abstract: Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel techniq… ▽ More

    Submitted 11 January, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page at style-aligned-gen.github.io

  22. arXiv:2311.17609  [pdf, other

    cs.CV cs.GR cs.LG

    AnyLens: A Generative Diffusion Model with Any Rendering Lens

    Authors: Andrey Voynov, Amir Hertz, Moab Arar, Shlomi Fruchter, Daniel Cohen-Or

    Abstract: State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image capture. The influence of different optical systems on the final scene appearance is frequently overlooked. This study introduces a framework that intimately integrate… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  23. arXiv:2311.17083  [pdf, other

    cs.CV

    CLiC: Concept Learning in Context

    Authors: Mehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali Mahdavi-Amiri

    Abstract: This paper addresses the challenge of learning a local visual pattern of an object from one image, and generating images depicting objects with that pattern. Learning a localized concept and placing it on an object in a target image is a nontrivial task, as the objects may have different orientations and shapes. Our approach builds upon recent advancements in visual concept learning. It involves a… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  24. arXiv:2311.13608  [pdf, other

    cs.CV cs.GR cs.LG

    Breathing Life Into Sketches Using Text-to-Video Priors

    Authors: Rinon Gal, Yael Vinker, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Ariel Shamir, Gal Chechik

    Abstract: A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually. An animated sketch opens another dimension to the expression of ideas and is widely used by designers for a variety of purposes. Animating sketches is a laborious process, requiring extensive experience and professional design skills. In this work, we present a method that automatically adds motion… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Project page: https://livesketch.github.io/

  25. arXiv:2311.10093  [pdf, other

    cs.CV cs.GR cs.LG

    The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

    Authors: Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

    Abstract: Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images… ▽ More

    Submitted 5 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to SIGGRAPH 2024. Project page is available at https://omriavrahami.com/the-chosen-one/

  26. arXiv:2311.03335  [pdf, other

    cs.CV cs.GR

    Cross-Image Attention for Zero-Shot Appearance Transfer

    Authors: Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel Cohen-Or

    Abstract: Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images. In this work, we leverage this semantic knowledge to transfer the visual appearance between objects that share similar semantics but may differ significantly in shape. To achieve this, we build upon the self-attention layers of these generative models an… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Project page: https://garibida.github.io/cross-image-attention

  27. arXiv:2311.01714  [pdf, other

    cs.CV

    EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape Generation

    Authors: Zhengzhe Liu, **gyu Hu, Ka-Hei Hui, Xiaojuan Qi, Daniel Cohen-Or, Chi-Wing Fu

    Abstract: This paper presents a new text-guided technique for generating 3D shapes. The technique leverages a hybrid 3D shape representation, namely EXIM, combining the strengths of explicit and implicit representations. Specifically, the explicit stage controls the topology of the generated 3D shapes and enables local modifications, whereas the implicit stage refines the shape and paints it with plausible… ▽ More

    Submitted 30 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: SIGGRAPH Asia 2023 & TOG Project page: https://liuzhengzhe.github.io/EXIM.github.io/

  28. arXiv:2310.17590  [pdf, other

    cs.CV

    Noise-Free Score Distillation

    Authors: Oren Katzir, Or Patashnik, Daniel Cohen-Or, Dani Lischinski

    Abstract: Score Distillation Sampling (SDS) has emerged as the de facto approach for text-to-content generation in non-image domains. In this paper, we reexamine the SDS process and introduce a straightforward interpretation that demystifies the necessity for large Classifier-Free Guidance (CFG) scales, rooted in the distillation of an undesired noise term. Building upon our interpretation, we propose a nov… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Project page at https://orenkatzir.github.io/nfsd/

  29. arXiv:2310.14729  [pdf, other

    cs.CV cs.GR

    MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion

    Authors: Roy Kapon, Guy Tevet, Daniel Cohen-Or, Amit H. Bermano

    Abstract: We introduce Multi-view Ancestral Sampling (MAS), a method for 3D motion generation, using 2D diffusion models that were trained on motions obtained from in-the-wild videos. As such, MAS opens opportunities to exciting and diverse fields of motion previously under-explored as 3D data is scarce and hard to collect. MAS works by simultaneously denoising multiple 2D motion sequences representing diff… ▽ More

    Submitted 24 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  30. arXiv:2308.02669  [pdf, other

    cs.CV

    ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints

    Authors: Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or

    Abstract: Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative t… ▽ More

    Submitted 17 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Project page: https://kfirgoldberg.github.io/ConceptLab/

  31. arXiv:2307.07961  [pdf, other

    cs.CV

    EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes

    Authors: **gyuan Yang, Qirui Huang, Tingting Ding, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Visual Emotion Analysis (VEA) aims at predicting people's emotional responses to visual stimuli. This is a promising, yet challenging, task in affective computing, which has drawn increasing attention in recent years. Most of the existing work in this area focuses on feature design, while little attention has been paid to dataset construction. In this work, we introduce EmoSet, the first large-sca… ▽ More

    Submitted 28 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV2023, similar to the final version

  32. arXiv:2307.06925  [pdf, other

    cs.CV cs.GR cs.LG

    Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

    Authors: Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano

    Abstract: Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, wh… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Project page at https://datencoder.github.io

  33. arXiv:2307.06307  [pdf, other

    cs.CV cs.GR cs.LG

    Facial Reenactment Through a Personalized Generator

    Authors: Ariel Elazary, Yotam Nitzan, Daniel Cohen-Or

    Abstract: In recent years, the role of image generative models in facial reenactment has been steadily increasing. Such models are usually subject-agnostic and trained on domain-wide datasets. The appearance of the reenacted individual is learned from a single image, and hence, the entire breadth of the individual's appearance is not entirely captured, leading these methods to resort to unfaithful hallucina… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Project webpage: https://arielazary.github.io/PGR/

  34. arXiv:2306.16052  [pdf, other

    cs.CV

    SVNR: Spatially-variant Noise Removal with Denoising Diffusion

    Authors: Naama Pearl, Yaron Brodsky, Dana Berman, Assaf Zomet, Alex Rav Acha, Daniel Cohen-Or, Dani Lischinski

    Abstract: Denoising diffusion models have recently shown impressive results in generative tasks. By learning powerful priors from huge collections of training images, such models are able to gradually modify complete noise to a clean natural image via a sequence of small denoising steps, seemingly making them well-suited for single image denoising. However, effectively applying denoising diffusion models to… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  35. arXiv:2306.06088  [pdf, other

    cs.GR cs.CV cs.LG

    SENS: Part-Aware Sketch-based Implicit Neural Shape Modeling

    Authors: Alexandre Binninger, Amir Hertz, Olga Sorkine-Hornung, Daniel Cohen-Or, Raja Giryes

    Abstract: We present SENS, a novel method for generating and editing 3D models from hand-drawn sketches, including those of abstract nature. Our method allows users to quickly and easily sketch a shape, and then maps the sketch into the latent space of a part-aware neural implicit shape architecture. SENS analyzes the sketch and encodes its parts into ViT patch encoding, subsequently feeding them into a tra… ▽ More

    Submitted 21 February, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 25 pages, 24 figures

  36. arXiv:2305.18203  [pdf, other

    cs.CV

    Concept Decomposition for Visual Exploration and Inspiration

    Authors: Yael Vinker, Andrey Voynov, Daniel Cohen-Or, Ariel Shamir

    Abstract: A creative idea is often born from transforming, combining, and modifying ideas from existing visual examples capturing various concepts. However, one cannot simply copy the concept as a whole, and inspiration is achieved by examining certain aspects of the concept. Hence, it is often necessary to separate a concept into different aspects to provide new perspectives. In this paper, we propose a me… ▽ More

    Submitted 31 May, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: https://inspirationtree.github.io/inspirationtree/

  37. arXiv:2305.16311  [pdf, other

    cs.CV cs.GR cs.LG

    Break-A-Scene: Extracting Multiple Concepts from a Single Image

    Authors: Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

    Abstract: Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition:… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH Asia 2023. Project page: at: https://omriavrahami.com/break-a-scene/ Video: https://www.youtube.com/watch?v=-9EA-BhizgM

  38. arXiv:2305.15391  [pdf, other

    cs.CV

    A Neural Space-Time Representation for Text-to-Image Personalization

    Authors: Yuval Alaluf, Elad Richardson, Gal Metzer, Daniel Cohen-Or

    Abstract: A key aspect of text-to-image personalization methods is the manner in which the target concept is represented within the generative process. This choice greatly affects the visual fidelity, downstream editability, and disk space needed to store the learned concept. In this paper, we explore a new text-conditioning space that is dependent on both the denoising process timestep (time) and the denoi… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Project page available at https://neuraltextualinversion.github.io/NeTI/

  39. arXiv:2304.07090  [pdf, other

    cs.CV cs.GR cs.LG

    Delta Denoising Score

    Authors: Amir Hertz, Kfir Aberman, Daniel Cohen-Or

    Abstract: We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt. DDS leverages the rich generative prior of text-to-image diffusion models and can be used as a loss term in an optimization problem to steer an image towards a desired direction dictated by a text. DDS… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Project page: https://delta-denoising-score.github.io/

  40. arXiv:2303.13450  [pdf, other

    cs.CV cs.GR cs.LG

    Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes

    Authors: Dana Cohen-Bar, Elad Richardson, Gal Metzer, Raja Giryes, Daniel Cohen-Or

    Abstract: Recent breakthroughs in text-guided image generation have led to remarkable progress in the field of 3D synthesis from text. By optimizing neural radiance fields (NeRF) directly from text, recent methods are able to produce remarkable results. Yet, these methods are limited in their control of each object's placement or appearance, as they represent the scene as a whole. This can be a major issue… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: project page at https://danacohen95.github.io/Set-the-Scene/

  41. arXiv:2303.11306  [pdf, other

    cs.CV cs.GR cs.LG

    Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

    Authors: Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, Daniel Cohen-Or

    Abstract: Text-to-image models give rise to workflows which often begin with an exploration step, where users sift through a large collection of generated images. The global nature of the text-to-image generation process prevents users from narrowing their exploration to a particular object in the image. In this paper, we present a technique to generate a collection of images that depicts variations in the… ▽ More

    Submitted 12 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: ICCV 2023. Project page at https://orpatashnik.github.io/local-prompt-mixing/

  42. arXiv:2303.10735  [pdf, other

    cs.CV cs.GR

    SKED: Sketch-guided Text-based 3D Editing

    Authors: Aryan Mikaeili, Or Perel, Mehdi Safaee, Daniel Cohen-Or, Ali Mahdavi-Amiri

    Abstract: Text-to-image diffusion models are gradually introduced into computer graphics, recently enabling the development of Text-to-3D pipelines in an open domain. However, for interactive editing purposes, local manipulations of content through a simplistic textual interface can be arduous. Incorporating user guided sketches with Text-to-image pipelines offers users more intuitive control. Still, as sta… ▽ More

    Submitted 18 August, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

  43. arXiv:2303.09522  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    P+: Extended Textual Conditioning in Text-to-Image Generation

    Authors: Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, Kfir Aberman

    Abstract: We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inve… ▽ More

    Submitted 15 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  44. arXiv:2303.01818  [pdf, other

    cs.CV cs.AI cs.GR

    Word-As-Image for Semantic Typography

    Authors: Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel Shamir

    Abstract: A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually… ▽ More

    Submitted 6 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  45. arXiv:2302.12228  [pdf, other

    cs.CV cs.GR cs.LG

    Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

    Authors: Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or

    Abstract: Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach.… ▽ More

    Submitted 5 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: Project page at https://tuning-encoder.github.io/

  46. arXiv:2302.10167  [pdf, other

    cs.CV cs.GR cs.LG

    Cross-domain Compositing with Pretrained Diffusion Models

    Authors: Roy Hachnochi, Mingrui Zhao, Nadav Orzech, Rinon Gal, Ali Mahdavi-Amiri, Daniel Cohen-Or, Amit Haim Bermano

    Abstract: Diffusion models have enabled high-quality, conditional image editing capabilities. We propose to expand their arsenal, and demonstrate that off-the-shelf diffusion models can be used for a wide range of cross-domain compositing tasks. Among numerous others, these include image blending, object immersion, texture-replacement and even CG2Real translation or stylization. We employ a localized, itera… ▽ More

    Submitted 25 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Code: https://github.com/cross-domain-compositing/cross-domain-compositing

  47. arXiv:2302.05905  [pdf, other

    cs.CV cs.AI cs.GR

    Single Motion Diffusion

    Authors: Sigal Raab, Inbal Leibovitch, Guy Tevet, Moab Arar, Amit H. Bermano, Daniel Cohen-Or

    Abstract: Synthesizing realistic animations of humans, animals, and even imaginary creatures, has long been a goal for artists and computer graphics professionals. Compared to the imaging domain, which is rich with large available datasets, the number of data instances for the motion domain is limited, particularly for the animation of animals and exotic creatures (e.g., dragons), which have unique skeleton… ▽ More

    Submitted 13 June, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: Video: https://www.youtube.com/watch?v=zuWpVTgb_0U, Project page: https://sinmdm.github.io/SinMDM-page, Code: https://github.com/SinMDM/SinMDM

  48. arXiv:2302.01721  [pdf, other

    cs.CV cs.GR

    TEXTure: Text-Guided Texturing of 3D Shapes

    Authors: Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, Daniel Cohen-Or

    Abstract: In this paper, we present TEXTure, a novel method for text-guided generation, editing, and transfer of textures for 3D shapes. Leveraging a pretrained depth-to-image diffusion model, TEXTure applies an iterative scheme that paints a 3D model from different viewpoints. Yet, while depth-to-image models can create plausible textures from a single viewpoint, the stochastic nature of the generation pro… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: Project page available at https://texturepaper.github.io/TEXTurePaper/

  49. arXiv:2301.13826  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

    Authors: Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, Daniel Cohen-Or

    Abstract: Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of cata… ▽ More

    Submitted 31 May, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to SIGGRAPH 2023; Project page available at https://yuval-alaluf.github.io/Attend-and-Excite/

  50. arXiv:2301.05225  [pdf, other

    cs.CV cs.GR cs.LG

    Domain Expansion of Image Generators

    Authors: Yotam Nitzan, Michaël Gharbi, Richard Zhang, Taesung Park, Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman

    Abstract: Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent sp… ▽ More

    Submitted 17 April, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Project Page and code are available at https://yotamnitzan.github.io/domain-expansion/. CVPR 2023 Camera-Ready