Skip to main content

Showing 1–3 of 3 results for author: Shentu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17965  [pdf, other

    cs.CV

    AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization

    Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

    Abstract: With the unprecedented performance being achieved by text-to-image (T2I) diffusion models, T2I customization further empowers users to tailor the diffusion model to new concepts absent in the pre-training dataset, termed subject-driven generation. Moreover, extracting several new concepts from a single image enables the model to learn multiple concepts, and simultaneously decreases the difficultie… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.17450  [pdf, other

    cs.CV cs.LG

    The Power of Next-Frame Prediction for Learning Physical Laws

    Authors: Thomas Winterbottom, G. Thomas Hudson, Daniel Kluvanec, Dean Slack, Jamie Sterling, Junjie Shentu, Chenghao Xiao, Zheming Zhou, Noura Al Moubayed

    Abstract: Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 7 Figures, 12 Pages, 1 Table

    MSC Class: 68T45 ACM Class: I.2.6; I.2.10

  3. arXiv:2402.09966  [pdf, other

    cs.CV

    Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

    Authors: Junjie Shentu, Matthew Watson, Noura Al Moubayed

    Abstract: Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-ima… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.