Skip to main content

Showing 1–18 of 18 results for author: Mi, M B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01733  [pdf, other

    cs.LG cs.CV

    Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

    Authors: Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang

    Abstract: Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of laye… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/horseee/learning-to-cache

  2. arXiv:2401.11115  [pdf, other

    cs.CV

    MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation

    Authors: Nhat M. Hoang, Kehong Gong, Chuan Guo, Michael Bi Mi

    Abstract: Controllable generation of 3D human motions becomes an important topic as the world embraces digital transformation. Existing works, though making promising progress with the advent of diffusion models, heavily rely on meticulously captured and annotated (e.g., text) high-quality motion corpus, a resource-intensive endeavor in the real world. This motivates our proposed MotionMix, a simple yet eff… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted at the 38th Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, Main Conference

  3. Semantic Segmentation in Multiple Adverse Weather Conditions with Domain Knowledge Retention

    Authors: Xin Yang, Wending Yan, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Semantic segmentation's performance is often compromised when applied to unlabeled adverse weather conditions. Unsupervised domain adaptation is a potential approach to enhancing the model's adaptability and robustness to adverse weather. However, existing methods encounter difficulties when sequentially adapting the model to multiple unlabeled adverse weather conditions. They struggle to acquire… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  4. arXiv:2312.17492  [pdf, other

    cs.CV

    HEAP: Unsupervised Object Discovery and Localization with Contrastive Grou**

    Authors: Xin Zhang, **heng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at… ▽ More

    Submitted 4 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  5. arXiv:2312.08746  [pdf, other

    cs.CV

    DreamDrone

    Authors: Hanyang Kong, Dongze Lian, Michael Bi Mi, Xinchao Wang

    Abstract: We introduce DreamDrone, an innovative method for generating unbounded flythrough scenes from textual prompts. Central to our method is a novel feature-correspondence-guidance diffusion process, which utilizes the strong correspondence of intermediate features in the diffusion model. Leveraging this guidance strategy, we further propose an advanced technique for editing the intermediate latent cod… ▽ More

    Submitted 17 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 16 pages, 12 figures, project page: https://hyokong.github.io/dreamdrone-page/

  6. arXiv:2312.00462  [pdf, other

    cs.CV

    Learning Unorthogonalized Matrices for Rotation Estimation

    Authors: Kerui Gu, Zhihao Li, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao

    Abstract: Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that comm… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  7. arXiv:2308.14480  [pdf, other

    cs.CV cs.MM

    Priority-Centric Human Motion Generation in Discrete Latent Space

    Authors: Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

    Abstract: Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is ess… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  8. arXiv:2308.03982  [pdf, other

    cs.CV

    PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

    Authors: Ming Nie, Yu**g Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang

    Abstract: Recently, polar-based representation has shown promising properties in perceptual tasks. In addition to Cartesian-based approaches, which separate point clouds unevenly, representing point clouds as polar grids has been recognized as an alternative due to (1) its advantage in robust performance under different resolutions and (2) its superiority in streaming-based approaches. However, state-of-the… ▽ More

    Submitted 2 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  9. arXiv:2305.00646  [pdf, other

    cs.CV

    Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction

    Authors: Ziwei Yu, Chen Li, Linlin Yang, Xiaoxu Zheng, Michael Bi Mi, Gim Hee Lee, Angela Yao

    Abstract: Direct mesh fitting for 3D hand shape reconstruction is highly accurate. However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes. Conversely, parametric models like MANO ensure plausible hand shapes but are not as accurate as the non-parametric methods. In this work, we introduce a novel weakly-supervised hand shape estimation framework that integrates n… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: CVPR 2023

  10. arXiv:2305.00163  [pdf, other

    cs.CV

    Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

    Authors: Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao

    Abstract: In video super-resolution, it is common to use a frame-wise alignment to support the propagation of information over time. The role of alignment is well-studied for low-level enhancement in video, but existing works overlook a critical step -- resampling. We show through extensive experiments that for alignment to be effective, the resampling should preserve the reference frequency spectrum while… ▽ More

    Submitted 17 January, 2024; v1 submitted 28 April, 2023; originally announced May 2023.

  11. arXiv:2304.02419  [pdf, other

    cs.CV

    TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

    Authors: Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang

    Abstract: We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limit… ▽ More

    Submitted 1 October, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted by ICCV2023

  12. arXiv:2301.12900  [pdf, other

    cs.AI cs.CV

    DepGraph: Towards Any Structural Pruning

    Authors: Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang

    Abstract: Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grou** patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grou** schemes, non-generalizable to new architectures. In this work, we study a highly-challenging yet barely-explored task, any structur… ▽ More

    Submitted 23 March, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  13. arXiv:2301.10431  [pdf, other

    cs.CV

    Bias-Compensated Integral Regression for Human Pose Estimation

    Authors: Kerui Gu, Linlin Yang, Michael Bi Mi, Angela Yao

    Abstract: In human and hand pose estimation, heatmaps are a crucial intermediate representation for a body or hand keypoint. Two popular methods to decode the heatmap into a final joint coordinate are via an argmax, as done in heatmap detection, or via softmax and expectation, as done in integral regression. Integral regression is learnable end-to-end, but has lower accuracy than detection. This paper uncov… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  14. arXiv:2301.08915  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Deep Regression with Ordinal Entropy

    Authors: Shihao Zhang, Linlin Yang, Michael Bi Mi, Xiaoxu Zheng, Angela Yao

    Abstract: In computer vision, it is often observed that formulating regression problems as a classification task often yields better performance. We investigate this curious phenomenon and provide a derivation to show that classification, with the cross-entropy loss, outperforms regression with a mean squared error loss in its ability to learn high-entropy feature representations. Based on the analysis, we… ▽ More

    Submitted 28 February, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

    Comments: Accepted to ICLR 2023. Project page: https://github.com/needylove/OrdinalEntropy

  15. arXiv:2211.13409  [pdf, other

    cs.CV

    Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation

    Authors: Xin Yang, Michael Bi Mi, Yuan Yuan, Xin Wang, Robby T. Tan

    Abstract: Most existing domain adaptation (DA) methods align the features based on the domain feature distributions and ignore aspects related to fog, background and target objects, rendering suboptimal performance. In our DA framework, we retain the depth and background information during the domain feature alignment. A consistency loss between the generated depth and fog transmission map is introduced to… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by ACCV

  16. arXiv:2205.00301  [pdf, other

    cs.CV

    ONCE-3DLanes: Building Monocular 3D Lane Detection

    Authors: Fan Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang

    Abstract: We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space. Conventional 2D lane detection from a monocular image yields poor performance of following planning and control tasks in autonomous driving due to the case of uneven road. Predicting the 3D lane layout is thus necessary and enables effective and safe driving. However, existing 3D lane detectio… ▽ More

    Submitted 14 May, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

    Comments: CVPR 2022. Project page at https://once-3dlanes.github.io

  17. arXiv:2203.15625  [pdf, other

    cs.CV

    PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

    Authors: Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, **g Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang

    Abstract: Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses. In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing d… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 Oral Paper, code available: https://github.com/Garfield-kh/PoseTriplet

  18. arXiv:2203.13394  [pdf, other

    cs.CV

    Point2Seq: Detecting 3D Objects as Sequences

    Authors: Yu**g Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei Zhang, Xiaogang Wang, Xinchao Wang

    Abstract: We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. In contrast to previous methods that normally {predict attributes of 3D objects all at once}, we expressively model the interdependencies between attributes of 3D objects, which in turn enables a better detection accuracy. Specifically, we view each 3D object as a sequence of words and reformul… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR2022