Skip to main content

Showing 1–50 of 123 results for author: Hong, F

.
  1. arXiv:2406.09905  [pdf, other

    cs.CV cs.GR

    Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

    Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo ** Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

    Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2406.06526  [pdf, other

    cs.CV

    GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

    Authors: Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

    Abstract: 3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage over… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  3. arXiv:2406.04872  [pdf, other

    cs.LG

    Diversified Batch Selection for Training Acceleration

    Authors: Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Yanfeng Wang

    Abstract: The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  4. arXiv:2406.04354  [pdf, other

    eess.AS

    QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise

    Authors: Xiaoyang Du, Feng Hong

    Abstract: Target identification of ship-radiated noise is a crucial area in underwater target recognition. However, there is currently a lack of multi-target ship datasets that accurately represent real-world underwater acoustic conditions. To tackle this issue, we conducted experimental data acquisition, resulting in the release of QiandaoEar22 \textemdash a comprehensive underwater acoustic multi-target d… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  5. arXiv:2406.04353  [pdf, other

    eess.AS cs.SD

    Introducing the Brand New QiandaoEar22 Dataset for Specific Ship Identification Using Ship-Radiated Noise

    Authors: Xiaoyang Du, Feng Hong

    Abstract: Target identification of ship-radiated noise is a crucial area in underwater target recognition. However, there is currently a lack of multi-target ship datasets that accurately represent real-world underwater acoustic conditions. To ntackle this issue, we release QiandaoEar22 \textemdash an underwater acoustic multi-target dataset, which can be download on https://ieee-dataport.org/documents/qian… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  6. arXiv:2405.10305  [pdf, other

    cs.CV cs.AI

    4D Panoptic Scene Graph Generation

    Authors: **gkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

    Abstract: We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts r… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted as NeurIPS 2023. Code: https://github.com/**gkang50/PSG4D Previous Series: PSG https://github.com/**gkang50/OpenPSG and PVSG https://github.com/**gkang50/OpenPVSG

  7. arXiv:2405.08055  [pdf, other

    cs.CV

    DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

    Authors: Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

    Abstract: Generating diverse and high-quality 3D assets automatically poses a fundamental yet challenging task in 3D computer vision. Despite extensive efforts in 3D generation, existing optimization-based approaches struggle to produce large-scale 3D assets efficiently. Meanwhile, feed-forward methods often focus on generating only a single category or a few categories, limiting their generalizability. The… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2309.07920

  8. arXiv:2405.07029  [pdf

    cs.SD eess.AS

    A framework of text-dependent speaker verification for chinese numerical string corpus

    Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

    Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impa… ▽ More

    Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.01645

  9. arXiv:2404.19678  [pdf, ps, other

    cond-mat.supr-con cond-mat.str-el

    Density-wave-like gap evolution in La$_3$Ni$_2$O$_7$ under high pressure revealed by ultrafast optical spectroscopy

    Authors: Yanghao Meng, Yi Yang, Hualei Sun, Sasa Zhang, Jianlin Luo, Meng Wang, Fang Hong, Xinbo Wang, Xiaohui Yu

    Abstract: We explore the quasiparticle dynamics in bilayer nickelate La$_3$Ni$_2$O$_7$ crystal using ultrafast optical pump-probe spectroscopy at high pressure up to 34.2 GPa. At ambient pressure, the temperature dependence of relaxation indicates appearance of phonon bottleneck effect due to the opening of density-wave-like gap at 151 K. By analyzing the data with RT model, we identified the energy scale o… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures

  10. Generation of a precise time scale assisted by a near-continuously operating optical lattice clock

    Authors: Takumi Kobayashi, Daisuke Akamatsu, Kazumoto Hosaka, Yusuke Hisai, Akiko Nishiyama, Akio Kawasaki, Masato Wada, Hajime Inaba, Takehiko Tanabe, Feng-Lei Hong, Masami Yasuda

    Abstract: We report on a reduced time variation of a time scale with respect to Coordinated Universal Time (UTC) by steering a hydrogen-maser-based time scale with a near-continuously operating optical lattice clock. The time scale is generated in a post-processing analysis for 230 days with a hydrogen maser with its fractional frequency stability limited by a flicker floor of $2\times10^{-15}$ and an Yb op… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  11. arXiv:2404.01655  [pdf, other

    cs.CV

    FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls

    Authors: Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

    Abstract: We present FashionEngine, an interactive 3D human generation and editing system that creates 3D digital humans via user-friendly multimodal controls such as natural languages, visual perceptions, and hand-drawing sketches. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Project Page: https://taohuumd.github.io/projects/FashionEngine

  12. arXiv:2404.01284  [pdf, other

    cs.CV

    Large Motion Model for Unified Multi-Modal Motion Generation

    Authors: Mingyuan Zhang, Daisheng **, Chenyang Gu, Fangzhou Hong, Zhongang Cai, **gfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

    Abstract: Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on develo** specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Homepage: https://mingyuan-zhang.github.io/projects/LMM.html

  13. arXiv:2404.01241  [pdf, other

    cs.CV

    StructLDM: Structured Latent Diffusion for 3D Human Generation

    Authors: Tao Hu, Fangzhou Hong, Ziwei Liu

    Abstract: Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore more expressive and higher-dimensional latent space for 3D human modeling and propose StructLDM, a dif… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project page: https://taohuumd.github.io/projects/StructLDM/

  14. arXiv:2404.01225  [pdf, other

    cs.CV

    SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

    Authors: Tao Hu, Fangzhou Hong, Ziwei Liu

    Abstract: Dynamic human rendering from video sequences has achieved remarkable progress by formulating the rendering as a map** from static poses to human images. However, existing methods focus on the human appearance reconstruction of every single frame while the temporal motion relations are not fully explored. In this paper, we propose a new 4D motion modeling paradigm, SurMo, that jointly models the… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project Page: https://taohuumd.github.io/projects/SurMo/

  15. arXiv:2403.12019  [pdf, other

    cs.CV

    LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

    Authors: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

    Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harn… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: project webpage: https://nirvanalan.github.io/projects/ln3diff/

  16. arXiv:2403.02234  [pdf, other

    cs.CV

    3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

    Authors: Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

    Abstract: We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototy**. The sec… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/3DTopia/3DTopia

  17. arXiv:2402.16427  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci

    Electronic phase transitions and superconductivity in ferroelectric Sn$_2$P$_2$Se$_6$ under pressure

    Authors: He Zhang, Wei Zhong, Xiaohui Yu, Binbin Yue, Fang Hong

    Abstract: Since there is both strong electron-phonon coupling during a ferroelectric/FE transition and superconducting/SC transition, it has been an important topic to explore superconductivity from the FE instability. Sn$_2$P$_2$Se$_6$ arouses broad attention due to its unique FE properties. Here, we reported the electronic phase transitions and superconductivity in this compound based on high-pressure ele… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures

  18. arXiv:2312.16484  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci

    Emergence of superconductivity near 11 K by suppressing the 3-fold helical-chain structure in noncentrosymmetric HgS

    Authors: He Zhang, Wei Zhong, Yanghao Meng, Bowen Tang, Binbin Yue, Xiaohui Yu, Fang Hong

    Abstract: The trigonal $α$-HgS has a 3-fold helical chain structure, and is in form of a noncentrosymmetric $P3_121$ phase, known as the cinnabar phase. However, under pressure, the helical chains gradually approach and connect with each other, finally reconstructing into a centrosymmetric NaCl structure at 21 GPa. Superconductivity emerges just after this helical-nonhelical structural transition. The maxim… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 16 pages, 6 figures

  19. arXiv:2312.11038  [pdf, other

    cs.CV cs.LG

    UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

    Authors: Tianjie Dai, Ruipeng Zhang, Feng Hong, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs). However, current works mainly pay attention to the exploration on single dataset of CXRs, which locks the potential of this p… ▽ More

    Submitted 21 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted at IEEE Transactions on Medical Imaging

  20. arXiv:2312.04559  [pdf, other

    cs.CV cs.GR

    PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

    Authors: Zhaoxi Chen, Fangzhou Hong, Haiyi Mei, Guangcong Wang, Lei Yang, Ziwei Liu

    Abstract: We present PrimDiffusion, the first diffusion-based framework for 3D human generation. Devising diffusion models for 3D human generation is difficult due to the intensive computational cost of 3D representations and the articulated topology of 3D humans. To tackle these challenges, our key insight is operating the denoising diffusion process directly on a set of volumetric primitives, which models… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023; Project page https://frozenburning.github.io/projects/primdiffusion/ Code available at https://github.com/FrozenBurning/PrimDiffusion

  21. arXiv:2312.01645  [pdf

    cs.SD eess.AS

    A text-dependent speaker verification application framework based on Chinese numerical string corpus

    Authors: Litong Zheng, Feng Hong, Weijie Xu

    Abstract: Researches indicate that text-dependent speaker verification (TD-SV) often outperforms text-independent verification (TI-SV) in short speech scenarios. However, collecting large-scale fixed text speech data is challenging, and as speech length increases, factors like sentence rhythm and pauses affect TDSV's sensitivity to text sequence. Based on these factors, We propose the hypothesis that strate… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  22. arXiv:2310.17622  [pdf, other

    cs.LG

    Combating Representation Learning Disparity with Geometric Harmonization

    Authors: Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya Zhang, Bo Han, Yanfeng Wang

    Abstract: Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios. Nevertheless, when facing the long-tailed distribution in real-world applications, it is still hard for existing methods to capture transferable and robust representation. Conventional SSL methods, pursuing sample-level uniformity, eas… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023 (spotlight)

  23. arXiv:2310.16112  [pdf, other

    cs.CV

    Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

    Authors: Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

    Abstract: Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of… ▽ More

    Submitted 1 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Update after major revision

  24. arXiv:2310.05532  [pdf

    cond-mat.mtrl-sci

    Observation of Emergent Superconductivity in the Quantum Spin Hall Insulator Ta2Pd3Te5 via Pressure Manipulation

    Authors: Hui Yu, Dayu Yan, Zhaopeng Guo, Yizhou Zhou, Xue Yang, Peiling Li, Zhijun Wang, Xiaojun Xiang, Junkai Li, Xiaoli Ma, Rui Zhou, Fang Hong, Yunxiao Wuli, Youguo Shi, Jian-Tao Wang, Xiaohui Yu

    Abstract: Quantum Spin Hall (QSH) insulators possess distinct helical in-gap states, enabling their edge states to act as one-dimensional conducting channels when backscattering is prohibited by time-reversal symmetry. However, it remains challenging to achieve high-performance combinations of nontrivial topological QSH states with superconductivity for applications and requires understanding of the complic… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 16pages,4figures

  25. arXiv:2309.07920  [pdf, other

    cs.CV

    Large-Vocabulary 3D Diffusion Model with Transformer

    Authors: Ziang Cao, Fangzhou Hong, Tong Wu, Liang Pan, Ziwei Liu

    Abstract: Creating diverse and high-quality 3D assets with an automatic generative model is highly desirable. Despite extensive efforts on 3D generation, most existing works focus on the generation of a single category or a few categories. In this paper, we introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model. Notably,… ▽ More

    Submitted 15 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Project page at https://ziangcao0312.github.io/difftf_pages/

  26. arXiv:2309.04410  [pdf, other

    cs.CV cs.GR

    DeformToon3D: Deformable 3D Toonification from Neural Radiance Fields

    Authors: Junzhe Zhang, Yushi Lan, Shuai Yang, Fangzhou Hong, Quan Wang, Chai Kiat Yeo, Ziwei Liu, Chen Change Loy

    Abstract: In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture. Although fine-tuning a pre-trained 3D GAN on the artistic domain can produce reasonable performance, this strategy has limitations in the 3D domain. In particular, fine-tuning can deteriorate the original GAN la… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Code: https://github.com/junzhezhang/DeformToon3D Project page: https://www.mmlab-ntu.com/project/deformtoon3d/

  27. arXiv:2309.00610  [pdf, other

    cs.CV

    CityDreamer: Compositional Generative Model of Unbounded 3D Cities

    Authors: Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

    Abstract: 3D city generation is a desirable yet challenging task, since humans are more sensitive to structural distortions in urban environments. Additionally, generating 3D cities is more complex than 3D natural scenes since buildings, as objects of the same class, exhibit a wider range of appearances compared to the relatively consistent appearance of objects like trees in natural scenes. To address thes… ▽ More

    Submitted 5 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: CVPR 2024. Project page: https://haozhexie.com/project/city-dreamer

  28. arXiv:2308.14492  [pdf, other

    cs.CV

    PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

    Authors: Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

    Abstract: Human pose and shape estimation (HPS) has attracted increasing attention in recent years. While most existing studies focus on HPS from 2D images or videos with inherent depth ambiguity, there are surging need to investigate HPS from 3D point clouds as depth sensors have been frequently employed in commercial devices. However, real-world sensory 3D points are usually noisy and incomplete, and also… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  29. arXiv:2308.09712  [pdf, other

    cs.CV

    HumanLiff: Layer-wise 3D Human Generation with Diffusion Model

    Authors: Shoukang Hu, Fangzhou Hong, Tao Hu, Liang Pan, Haiyi Mei, Weiye Xiao, Lei Yang, Ziwei Liu

    Abstract: 3D human generation from 2D images has achieved remarkable progress through the synergistic utilization of neural rendering and generative models. Existing 3D human generative models mainly generate a clothed 3D human as an undetectable 3D model in a single pass, while rarely considering the layer-wise nature of a clothed human body, which often consists of the human body and various clothes such… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Project page: https://skhu101.github.io/HumanLiff/

  30. arXiv:2308.08853  [pdf, other

    cs.CV cs.LG

    Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

    Authors: Feng Hong, Tianjie Dai, Jiangchao Yao, Ya Zhang, Yanfeng Wang

    Abstract: Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature. However, few attempts take into account the coupled challenges posed by both the class imbalance and label co-occurrence, which hinders their value to boost the diagnosis on chest X-rays (CXRs) in the real-world scenarios. Besides… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted for the ICCV 2023 Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

  31. arXiv:2308.01698  [pdf, other

    cs.CV

    Balanced Destruction-Reconstruction Dynamics for Memory-replay Class Incremental Learning

    Authors: Yuhang Zhou, Jiangchao Yao, Feng Hong, Ya Zhang, Yanfeng Wang

    Abstract: Class incremental learning (CIL) aims to incrementally update a trained model with the new classes of samples (plasticity) while retaining previously learned ability (stability). To address the most challenging issue in this goal, i.e., catastrophic forgetting, the mainstream paradigm is memory-replay CIL, which consolidates old knowledge by replaying a small number of old classes of samples saved… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  32. arXiv:2307.16438  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci

    Coexistence of Superconductivity and ferromagnetism in high entropy carbide ceramics

    Authors: Huchen Shu, Wei Zhong, Jiajia Feng, Hongyang Zhao, Fang Hong, Binbin Yue

    Abstract: Generally, the superconductivity was expected to be absent in magnetic systems, but this reception was disturbed by unconventional superconductors, such as cuprates, iron-based superconductors and recently discovered nickelate, since their superconductivity is proposed to be related to the electron-electron interaction mediated by the spin fluctuation. However, the coexistence of superconductivity… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 16 pages, 5 figures, 1 table. Suggestion and comments are welcome

  33. arXiv:2307.09906  [pdf, other

    cs.CV cs.AI

    Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation

    Authors: Fa-Ting Hong, Dan Xu

    Abstract: Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image. However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information… ▽ More

    Submitted 18 August, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV2023, update the reference and figures

  34. arXiv:2305.16504  [pdf, other

    cs.CL cs.AI cs.LG

    On the Tool Manipulation Capability of Open-source Large Language Models

    Authors: Qiantong Xu, Fenglu Hong, Bo Li, Changran Hu, Zhengyu Chen, Jian Zhang

    Abstract: Recent studies on software tool manipulation with large language models (LLMs) mostly rely on closed model APIs. The industrial adoption of these models is substantially constrained due to the security and robustness risks in exposing information to closed LLM API services. In this paper, we ask can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation, with… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  35. arXiv:2305.06225  [pdf, other

    cs.CV cs.AI

    DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

    Authors: Fa-Ting Hong, Li Shen, Dan Xu

    Abstract: Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibit… ▽ More

    Submitted 10 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted at TPAMI; CVPR 2022 extension

  36. arXiv:2304.01116  [pdf, other

    cs.CV

    ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

    Authors: Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu

    Abstract: 3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  37. arXiv:2303.15944  [pdf, other

    cs.LG cs.SD eess.AS

    Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding

    Authors: Haiquan Mao, Feng Hong, Man-wai Mak

    Abstract: Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. Inspired by the self-training strategies that use an existing classifier to label the unlabeled data for retraining, we propose a cluster-guided UDA framework that labels the target domain data by clustering and combines the labeled source domain data and pseudo-labeled tar… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  38. arXiv:2303.12791  [pdf, other

    cs.CV

    SHERF: Generalizable Human NeRF from a Single Image

    Authors: Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu

    Abstract: Existing Human NeRF methods for reconstructing 3D humans typically rely on multiple 2D images from multi-view cameras or monocular videos captured from fixed camera views. However, in real-world scenarios, human images are often captured from random camera angles, presenting challenges for high-quality 3D human reconstruction. In this paper, we propose SHERF, the first generalizable Human NeRF mod… ▽ More

    Submitted 16 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV2023. Project webpage: https://skhu101.github.io/SHERF/

  39. Frequency-multiplexed Hong-Ou-Mandel interference

    Authors: Mayuka Ichihara, Daisuke Yoshida, Feng-Lei Hong, Tomoyuki Horikiri

    Abstract: The implementation of quantum repeaters needed for long-distance quantum communication requires the generation of quantum entanglement distributed among the elementary links. These entanglements must be swapped among the quantum repeaters through Bell-state measurements. This study aims to improve the entanglement generation rate by frequency multiplexing the Bell-state measurements. As a prelimin… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 9 pages, 6 figures

    Journal ref: Physical Review A 107, 032608 (2023)

  40. Frequency-multiplexed storage and distribution of narrowband telecom photon pairs over a 10-km fiber link with long-term system stability

    Authors: Ko Ito, Takeshi Kondo, Kyoko Mannami, Kazuya Niizeki, Daisuke Yoshida, Kohei Minaguchi, Mingyang Zheng, Feng-Lei Hong, Tomoyuki Horikiri

    Abstract: The ability to transmit quantum states over long distances is a fundamental requirement of the quantum internet and is reliant upon quantum repeaters. Quantum repeaters involve entangled photon sources that emit and deliver photonic entangled states at high rates and quantum memories that can temporarily store quantum states. Improvement of the entanglement distribution rate is essential for quant… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 10 pages

    Journal ref: Physical Review Applied 19, 024070 (2023)

  41. arXiv:2302.05080  [pdf, other

    cs.LG cs.CV

    Long-Tailed Partial Label Learning via Dynamic Rebalancing

    Authors: Feng Hong, Jiangchao Yao, Zhihan Zhou, Ya Zhang, Yanfeng Wang

    Abstract: Real-world data usually couples the label ambiguity and heavy imbalance, challenging the algorithmic robustness of partial label learning (PLL) and long-tailed learning (LT). The straightforward combination of LT and PLL, i.e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced i… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: ICLR 2023

  42. arXiv:2210.13112  [pdf, other

    cs.RO

    Optimization-based Motion Planning for Autonomous Parking Considering Dynamic Obstacle: A Hierarchical Framework

    Authors: Xuemin Chi, Zhitao Liu, Jihao Huang, Feng Hong, Hongye Su

    Abstract: This paper introduces a hierarchical framework that integrates graph search algorithms and model predictive control to facilitate efficient parking maneuvers for Autonomous Vehicles (AVs) in constrained environments. In the high-level planning phase, the framework incorporates scenario-based hybrid A* (SHA*), an optimized variant of traditional Hybrid A*, to generate an initial path while consider… ▽ More

    Submitted 14 November, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Update some typos and references

  43. Single-shot high-resolution identification of discrete frequency modes of single-photon-level optical pulses

    Authors: Daisuke Yoshida, Mayuka Ichihara, Takeshi Kondo, Feng-Lei Hong, Tomoyuki Horikiri

    Abstract: Frequency-multiplexed quantum communication usually requires a single-shot identification of the frequency mode of a single photon . In this paper, we propose a scheme that can identify the frequency mode with high-resolution even for spontaneously emitted photons whose generation time is unknown, by combining the time-to-space and frequency-to-time mode map**. We also demonstrate the map** of… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 6 pages, 5 figures

  44. arXiv:2210.08828  [pdf, other

    cs.RO

    Search-Based Path Planning Algorithm for Autonomous Parking:Multi-Heuristic Hybrid A*

    Authors: Jihao Huang, Zhitao Liu, Xuemin Chi, Feng Hong, Hongye Su

    Abstract: This paper proposed a novel method for autonomous parking. Autonomous parking has received a lot of attention because of its convenience, but due to the complex environment and the non-holonomic constraints of vehicle, it is difficult to get a collision-free and feasible path in a short time. To solve this problem, this paper introduced a novel algorithm called Multi-Heuristic Hybrid A* (MHHA*) wh… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  45. arXiv:2210.04888  [pdf, other

    cs.CV

    EVA3D: Compositional 3D Human Generation from 2D Image Collections

    Authors: Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, Ziwei Liu

    Abstract: Inverse graphics aims to recover 3D models from 2D observations. Utilizing differentiable rendering, recent 3D-aware generative models have shown impressive results of rigid object generation using 2D images. However, it remains challenging to generate articulated objects, like human bodies, due to their complexity and diversity in poses and appearances. In this work, we propose, EVA3D, an uncondi… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Project Page at https://hongfz16.github.io/projects/EVA3D.html

  46. arXiv:2208.15001  [pdf, other

    cs.CV

    MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

    Authors: Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

    Abstract: Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

  47. arXiv:2206.11011  [pdf, other

    cs.CV

    Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning

    Authors: Jia-Run Du, Jia-Chang Feng, Kun-Yu Lin, Fa-Ting Hong, Xiao-Ming Wu, Zhongang Qi, Ying Shan, Wei-Shi Zheng

    Abstract: Weakly Supervised Temporal Action Localization (WSTAL) aims to localize and classify action instances in long untrimmed videos with only video-level category labels. Due to the lack of snippet-level supervision for indicating action boundaries, previous methods typically assign pseudo labels for unlabeled snippets. However, since some action instances of different categories are visually similar,… ▽ More

    Submitted 14 November, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  48. arXiv:2205.08535  [pdf, other

    cs.CV

    AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

    Authors: Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

    Abstract: 3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to cu… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: SIGGRAPH 2022; Project Page https://hongfz16.github.io/projects/AvatarCLIP.html Codes available at https://github.com/hongfz16/AvatarCLIP

  49. arXiv:2204.13686  [pdf, other

    cs.CV

    HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

    Authors: Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

    Abstract: 4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications. With the advances of new sensors and algorithms, there is an increasing demand for more versatile datasets. In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-mod… ▽ More

    Submitted 16 April, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Homepage: https://caizhongang.github.io/projects/HuMMan/

  50. arXiv:2204.05064  [pdf

    quant-ph

    Quantum sensing with diamond NV centers under megabar pressures

    Authors: Jian-Hong Dai, Yan-Xing Shang, Yong-Hong Yu, Yue Xu, Hui Yu, Fang Hong, Xiao-Hui Yu, Xin-Yu Pan, Gang-Qin Liu

    Abstract: Megabar pressures are of crucial importance for cutting-edge studies of condensed matter physics and geophysics. With the development of diamond anvil cell, laboratory studies of high pressure have entered the megabar era for decades. However, it is still challenging to implement in-situ magnetic sensing under ultrahigh pressures. Here, we demonstrate optically detected magnetic resonance of diamo… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 9 pages, 4 figures