Skip to main content

Showing 1–25 of 25 results for author: Liew, J H

.
  1. arXiv:2406.00121  [pdf, other

    cs.CV

    Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations

    Authors: Tiancheng Shen, Jun Hao Liew, Long Mai, Lu Qi, Jiashi Feng, Jiaya Jia

    Abstract: Advances in text-based image generation and editing have revolutionized content creation, enabling users to create impressive content from imaginative text prompts. However, existing methods are not designed to work well with the oversimplified prompts that are often encountered in typical scenarios when users start their editing with only vague or abstract purposes in mind. Those scenarios demand… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  2. arXiv:2405.18428  [pdf, other

    cs.CV cs.AI

    DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

    Authors: Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang

    Abstract: Diffusion models with large-scale pre-training have achieved significant success in the field of visual content generation, particularly exemplified by Diffusion Transformers (DiT). However, DiT models have faced challenges with scalability and quadratic complexity efficiency. In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expa… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Code is released at https://github.com/hustvl/DiG

  3. arXiv:2405.17532  [pdf, other

    cs.CV

    ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

    Authors: Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Yunchao Wei

    Abstract: Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g. headphone is missing when generating a <sks> dog wearing a headphone'). Interestingly, we notice that the bas… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.13722  [pdf, other

    cs.CV

    InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

    Authors: Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng

    Abstract: Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minu… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Project page: https://instadrag.github.io/

  5. arXiv:2405.07510  [pdf, other

    cs.LG

    PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

    Authors: Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

    Abstract: We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated par… ▽ More

    Submitted 29 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  6. arXiv:2401.04468  [pdf, other

    cs.CV cs.AI

    MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

    Authors: Weimin Wang, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

    Abstract: The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  7. arXiv:2312.12425  [pdf, other

    cs.CV

    SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process

    Authors: Mengyu Wang, Henghui Ding, Jun Hao Liew, Jiajun Liu, Yao Zhao, Yunchao Wei

    Abstract: In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusi… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023, Code: https://github.com/MengyuWang826/SegRefiner

  8. arXiv:2312.12030  [pdf, other

    cs.CV cs.AI

    Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

    Authors: Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan

    Abstract: Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step es… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  9. arXiv:2311.17917  [pdf, other

    cs.GR cs.CV

    AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

    Authors: Jianfeng Zhang, Xuanmeng Zhang, Huichao Zhang, Jun Hao Liew, Chenxu Zhang, Yi Yang, Jiashi Feng

    Abstract: We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions. Existing text-to-avatar methods are either limited to static avatars which cannot be animated or struggle to generate animatable avatars with promising quality and precise pose control. To address these limitations, we propose AvatarStudio, a coarse-to-fine generative model that generates expli… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page at http://jeff95.me/projects/avatarstudio.html

  10. arXiv:2311.16498  [pdf, other

    cs.CV cs.GR

    MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

    Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou

    Abstract: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-war** technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project Page at https://showlab.github.io/magicanimate

  11. arXiv:2311.13574  [pdf, other

    cs.CV

    XAGen: 3D Expressive Human Avatars Generation

    Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Jiashi Feng, Mike Zheng Shou

    Abstract: Recent advances in 3D-aware GAN models have enabled the generation of realistic and controllable human body images. However, existing methods focus on the control of major body joints, neglecting the manipulation of expressive attributes, such as facial expressions, jaw poses, hand poses, and so on. In this work, we present XAGen, the first 3D generative model for human avatars capable of expressi… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023, Project Page at https://showlab.github.io/xagen

  12. arXiv:2309.00908  [pdf, other

    cs.CV

    MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

    Authors: Hanshu Yan, Jun Hao Liew, Long Mai, Shanchuan Lin, Jiashi Feng

    Abstract: This paper addresses the issue of modifying the visual appearance of videos while preserving their motion. A novel framework, named MagicProp, is proposed, which disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation. In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify t… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  13. arXiv:2308.14749  [pdf, other

    cs.CV

    MagicEdit: High-Fidelity and Temporally Coherent Video Editing

    Authors: Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng

    Abstract: In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translation can be achieved by explicitly disentangling the learning of content, structure and motion signals during training. This is in contradict to most existing methods which attempt to jointly model both t… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Project page: https://magic-edit.github.io/

  14. arXiv:2308.14748  [pdf, other

    cs.GR cs.CV

    MagicAvatar: Multimodal Avatar Generation and Animation

    Authors: Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun Hao Liew

    Abstract: This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the mu… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Project page: https://magic-avatar.github.io/

  15. arXiv:2307.10711  [pdf, other

    cs.CV cs.AI cs.LG

    AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

    Authors: Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan

    Abstract: Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denois… ▽ More

    Submitted 20 March, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

  16. arXiv:2306.14435  [pdf, other

    cs.CV cs.LG

    DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

    Authors: Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

    Abstract: Accurate and controllable image editing is a challenging task that has attracted significant attention recently. Notably, DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. However, due to its reliance on generative adversarial networks (GANs), its generality is limited by the capacity of pretrained GAN models. In this… ▽ More

    Submitted 7 April, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: Code is released at https://github.com/Yujun-Shi/DragDiffusion

  17. arXiv:2305.15248  [pdf, other

    cs.CV

    Delving Deeper into Data Scaling in Masked Image Modeling

    Authors: Cheng-Ze Lu, Xiaojie **, Qibin Hou, Jun Hao Liew, Ming-Ming Cheng, Jiashi Feng

    Abstract: Understanding whether self-supervised learning methods can scale with unlimited data is crucial for training large-scale models. In this work, we conduct an empirical study on the scaling capability of masked image modeling (MIM) methods (e.g., MAE) for visual recognition. Unlike most previous works that depend on the widely-used ImageNet dataset, which is manually curated and object-centric, we t… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  18. arXiv:2304.01114  [pdf, other

    cs.CV

    Associating Spatially-Consistent Grou** with Text-supervised Semantic Segmentation

    Authors: Yabo Zhang, Zihao Wang, Jun Hao Liew, **gjia Huang, Manyu Zhu, Jiashi Feng, Wangmeng Zuo

    Abstract: In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs. Due to the lack of dense annotations, existing text-supervised methods can only learn to group an image into semantic regions via pixel-insensitive feedback. As a result, their grouped results are coarse and often contain small spurious regions, limiting the upper-bound performance of… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  19. arXiv:2303.09181  [pdf, other

    cs.CV

    Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

    Authors: Kunyang Han, Yong Liu, Jun Hao Liew, Henghui Ding, Yunchao Wei, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao

    Abstract: Recent advancements in pre-trained vision-language models, such as CLIP, have enabled the segmentation of arbitrary concepts solely from textual inputs, a process commonly referred to as open-vocabulary semantic segmentation (OVS). However, existing OVS techniques confront a fundamental challenge: the trained classifier tends to overfit on the base classes observed during training, resulting in su… ▽ More

    Submitted 15 July, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV2023

  20. arXiv:2212.06384  [pdf, other

    cs.CV

    PV3D: A 3D Generative Model for Portrait Video Generation

    Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou

    Abstract: Recent advances in generative adversarial networks (GANs) have demonstrated the capabilities of generating stunning photo-realistic portrait images. While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos. In this work, we propose PV3D,… ▽ More

    Submitted 20 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted to ICLR2023, Project Page https://showlab.github.io/pv3d

  21. arXiv:2210.16056  [pdf, other

    cs.CV

    MagicMix: Semantic Mixing with Diffusion Models

    Authors: Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

    Abstract: Have you ever imagined what a corgi-alike coffee machine or a tiger-alike rabbit would look like? In this work, we attempt to answer these questions by exploring a new task called semantic mixing, aiming at blending two different semantics to create a new concept (e.g., corgi + coffee machine -- > corgi-alike coffee machine). Unlike style transfer, where an image is stylized according to the refer… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  22. arXiv:2202.07402  [pdf, other

    cs.CV

    SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask Representations

    Authors: Tao Wang, Jun Hao Liew, Yu Li, Yunpeng Chen, Jiashi Feng

    Abstract: Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks, yielding comparably good performance as traditional two-stage Mask R-CNN yet enjoying much simpler architecture and higher efficiency. We observe SOLO generates similar masks for an object at nearby grid cells, an… ▽ More

    Submitted 23 December, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: accepted to IEEE Transactions on Image Processing (TIP), code: https://github.com/advdfacd/AggMask

  23. arXiv:2105.02467  [pdf, other

    cs.CV

    Body Meshes as Points

    Authors: Jianfeng Zhang, Dongdong Yu, Jun Hao Liew, Xuecheng Nie, Jiashi Feng

    Abstract: We consider the challenging multi-person 3D body mesh estimation task in this work. Existing methods are mostly two-stage based--one stage for person localization and the other stage for individual body mesh estimation, leading to redundant pipelines with high computation cost and degraded performance for complex scenes (e.g., occluded person instances). In this work, we present a single-stage mod… ▽ More

    Submitted 5 July, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear at CVPR 2021

  24. arXiv:1910.13081  [pdf, ps, other

    cs.CV

    Classification Calibration for Long-tail Instance Segmentation

    Authors: Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

    Abstract: Remarkable progress has been made in object instance detection and segmentation in recent years. However, existing state-of-the-art methods are mostly evaluated with fairly balanced and class-limited benchmarks, such as Microsoft COCO dataset [8]. In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-t… ▽ More

    Submitted 30 July, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: This report presents our winning solution to LVIS 2019 challenge

  25. arXiv:1908.06391  [pdf, other

    cs.CV

    PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment

    Authors: Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng

    Abstract: Despite the great progress made by deep CNNs in image semantic segmentation, they typically require a large number of densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot segmentation has thus been developed to learn to perform segmentation from only a few annotated examples. In this paper, we tackle the challenging few-shot segmentation probl… ▽ More

    Submitted 6 February, 2020; v1 submitted 18 August, 2019; originally announced August 2019.

    Comments: 10 pages, 6 figures, ICCV 2019, code available at https://github.com/kaixin96/PANet