Skip to main content

Showing 1–8 of 8 results for author: Materzyńska, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.04966  [pdf, other

    cs.CV

    Customizing Motion in Text-to-Video Diffusion Models

    Authors: Joanna Materzynska, Josef Sivic, Eli Shechtman, Antonio Torralba, Richard Zhang, Bryan Russell

    Abstract: We introduce an approach for augmenting text-to-video generation models with customized motions, extending their capabilities beyond the motions depicted in the original training data. By leveraging a few video samples demonstrating specific movements as input, our method learns and generalizes the input motion patterns for diverse, text-specified scenarios. Our contributions are threefold. First,… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: this website https://joaanna.github.io/customizing_motion/

  2. arXiv:2311.12092  [pdf, other

    cs.CV

    Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

    Authors: Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

    Abstract: We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either… ▽ More

    Submitted 27 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

  3. arXiv:2309.03886  [pdf, other

    cs.CL cs.AI cs.LG

    FIND: A Function Description Benchmark for Evaluating Interpretability Methods

    Authors: Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba

    Abstract: Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable… ▽ More

    Submitted 8 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: 28 pages, 10 figures

    Journal ref: NeurIPS 2023

  4. arXiv:2308.14761  [pdf, other

    cs.CV cs.LG

    Unified Concept Editing in Diffusion Models

    Authors: Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

    Abstract: Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method,… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  5. arXiv:2303.07345  [pdf, other

    cs.CV

    Erasing Concepts from Diffusion Models

    Authors: Rohit Gandikota, Joanna Materzynska, Jaden Fiotto-Kaufman, David Bau

    Abstract: Motivated by recent advancements in text-to-image diffusion, we study erasure of specific concepts from the model's weights. While Stable Diffusion has shown promise in producing explicit or realistic artwork, it has raised concerns regarding its potential for misuse. We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the styl… ▽ More

    Submitted 20 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  6. arXiv:2206.07835  [pdf, other

    cs.CV

    Disentangling visual and written concepts in CLIP

    Authors: Joanna Materzynska, Antonio Torralba, David Bau

    Abstract: The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder. First, we find that the image encoder has an ability to match word images with natural images of scenes described by those words. This is consistent with previous research that suggests that the meaning… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  7. arXiv:1912.09930  [pdf, other

    cs.CV

    Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

    Authors: Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

    Abstract: Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an a… ▽ More

    Submitted 12 September, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

  8. arXiv:1706.04261  [pdf, other

    cs.CV

    The "something something" video database for learning and evaluating visual common sense

    Authors: Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzyńska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, Roland Memisevic

    Abstract: Neural networks trained on datasets such as ImageNet have led to major advances in visual object classification. One obstacle that prevents networks from reasoning more deeply about complex scenes and situations, and from integrating visual knowledge with natural language, like humans do, is their lack of common sense knowledge about the physical world. Videos, unlike still images, contain a wealt… ▽ More

    Submitted 15 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.