Skip to main content

Showing 1–47 of 47 results for author: Chai, W

.
  1. arXiv:2406.11247  [pdf, other

    cs.CV

    STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challengin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Embodied AI Workshop

  2. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  3. arXiv:2404.17176  [pdf, other

    cs.CV

    MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

    Authors: Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing methods either employ complex spatial-temporal modules or rely heavily on additional perception models to extract temporal features for video understanding, and they only perform well on short videos. For long… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2404.12871  [pdf, other

    cs.SI math.CO physics.soc-ph

    Expanding the Katz Index for Link Prediction: A Case Study on a Live Fish Movement Network

    Authors: Michael-Sam Vidza, Marcin Budka, Wei Koong Chai, Mark Thrush, Mickael Teixeira Alves

    Abstract: In aquaculture, disease spread models often neglect the dynamic interactions between farms, hindering accuracy. This study enhances the Katz index (KI) to incorporate spatial and temporal patterns of fish movement, improving the prediction of farms susceptible to disease via live fish transfers. We modified the Katz index to create models like the Weighted Katz Index (WKI), Edge Weighted Katz Inde… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 15 pages, 3 figures, submitted to Expert Systems with Applications

  5. arXiv:2404.04910  [pdf, other

    cs.CV

    MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

    Authors: Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Monocular 3D object detection (Mono3D) is an indispensable research topic in autonomous driving, thanks to the cost-effective monocular camera sensors and its wide range of applications. Since the image perspective has depth ambiguity, the challenges of Mono3D lie in understanding 3D scene geometry and reconstructing 3D object information from a single image. Previous methods attempted to transfer… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 14 pages

  6. arXiv:2404.04619  [pdf, other

    cs.AI cs.CV

    Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

    Authors: Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.08282

  7. arXiv:2403.18493  [pdf, other

    cs.CV

    VersaT2I: Improving Text-to-Image Models with Versatile Reward

    Authors: Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  8. arXiv:2403.10826  [pdf, other

    cs.CV

    Exploring Learning-based Motion Models in Multi-Object Tracking

    Authors: Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

    Abstract: In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios. However, the inherent limitations of these methods become evident when confronted with complex, nonlinear motions and occlusions prevalent in dynamic environments like sports and dance. This paper explores the possibilities of… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  9. arXiv:2403.08282  [pdf, other

    cs.CV

    Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

    Authors: Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

    Abstract: Due to the dynamic and unpredictable open-world setting, navigating complex environments in Minecraft poses significant challenges for multi-agent systems. Agents must interact with the environment and coordinate their actions with other agents to achieve common objectives. However, traditional approaches often struggle to efficiently manage inter-agent communication and task distribution, crucial… ▽ More

    Submitted 18 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 Workshop on LLM Agents

  10. arXiv:2403.05227  [pdf, ps, other

    cond-mat.supr-con

    Superconductivity in kagome metal ThRu3Si2

    Authors: Yi Liu, **g Li, Wu-Zhang Yang, Jia-Yi Lu, Bo-Ya Cao, Hua-Xun Li, Wan-Li Chai, Si-Qi Wu, Bai-Zhuo Li, Yun-Lei Sun, Wen-He Jiao, Wang Cao, Xiao-Feng Xu, Ren Zhi, Guang-Han Cao

    Abstract: We report the physical properties of ThRu$_3$Si$_2$ featured with distorted Ru kagome lattice. The combined experiments of resistivity, magnetization and specific heat reveal bulk superconductivity with $T_{\rm{c}}$ = 3.8 K. The specific heat jump and calculated electron-phonon coupling indicate a moderate coupled BCS superconductor. In comparison with LaRu$_3$Si$_2$, the calculated electronic str… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures

    Journal ref: Chinese Physics B (2024)

  11. arXiv:2402.09316  [pdf, other

    cs.CV cs.LG

    Only My Model On My Data: A Privacy Preserving Approach Protecting one Model and Deceiving Unauthorized Black-Box Models

    Authors: Weiheng Chai, Brian Testa, Huantao Ren, Asif Salekin, Senem Velipasalar

    Abstract: Deep neural networks are extensively applied to real-world tasks, such as face recognition and medical image classification, where privacy and data protection are critical. Image data, if not protected, can be exploited to infer personal or contextual information. Existing privacy preservation methods, like encryption, generate perturbed images that are unrecognizable to even humans. Adversarial a… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  12. arXiv:2312.08887  [pdf, other

    cs.CV cs.LG

    SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

    Authors: Weilong Chai, DanDan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma

    Abstract: Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Though many acceleration methods have been proposed, they suffer from generation quality degradation or extra training cost generalizing to new fine-tuned models. To address these limitations, we propose a novel and universal Stable-Diffusion (SD) acceleration module called Speed… ▽ More

    Submitted 20 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Table 1. shows the comparison with existing methods, but the lack of experimental data of the LCM method under 12-step makes the table incomplete. We need to temporarily withdraw the manuscript and conduct corresponding experiments before resubmitting it

  13. arXiv:2312.04793  [pdf, other

    cs.CV

    User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

    Authors: Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang

    Abstract: Image captioning bridges the gap between vision and language by automatically generating natural language descriptions for images. Traditional image captioning methods often overlook the preferences and characteristics of users. Personalized image captioning solves this problem by incorporating user prior knowledge into the model, such as writing styles and preferred vocabularies. Most existing me… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  14. arXiv:2312.01508  [pdf, other

    cs.CV

    CityGen: Infinite and Controllable 3D City Layout Generation

    Authors: Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang

    Abstract: City layout generation has recently gained significant attention. The goal of this task is to automatically generate the layout of a city scene, including elements such as roads, buildings, vegetation, as well as other urban infrastructures. Previous methods using VAEs or GANs for 3D city layout generation offer limited diversity and constrained interactivity, only allowing users to selectively re… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 12 pages, 9 figures

  15. arXiv:2311.16477  [pdf, other

    cs.CV

    UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

    Authors: Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang

    Abstract: In recent times, there has been a growing interest in develo** effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  16. arXiv:2311.15209  [pdf, other

    cs.AI

    See and Think: Embodied Agent in Virtual Environment

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Large language models (LLMs) have achieved impressive progress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. In this paper, we propose STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves t… ▽ More

    Submitted 2 December, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: Preprint. First three authors contribute equally to this work. Project Website https://rese1f.github.io/STEVE/

  17. arXiv:2311.12043  [pdf, other

    cs.CV cs.AI

    Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

    Authors: Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei Li, Jenq-Neng Hwang

    Abstract: Although 3D human pose estimation has gained impressive development in recent years, only a few works focus on infants, that have different bone lengths and also have limited data. Directly applying adult pose estimation models typically achieves low performance in the infant domain and suffers from out-of-distribution issues. Moreover, the limitation of infant pose data collection also heavily co… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: WACVW 2024

  18. arXiv:2309.13770  [pdf, other

    cs.LG cs.CV

    Devil in the Number: Towards Robust Multi-modality Data Filter

    Authors: Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

    Abstract: In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs. For instance, LAION papers employs the CLIP score filter to select data with CLIP scores surpassing a certain threshold. On the other hand, T-MARS achieves high-quality data filtering by detecting and masking text within i… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: ICCV 2023 Workshop: TNGCV-DataComp

  19. arXiv:2309.13514  [pdf, ps, other

    cond-mat.supr-con cond-mat.mtrl-sci cond-mat.str-el

    Superconductivity emerging from density-wave-like order in a correlated kagome metal

    Authors: Yi Liu, Zi-Yi Liu, **-Ke Bao, Peng-Tao Yang, Liang-Wen Ji, Si-Qi Wu, Qin-Xin Shen, Jun Luo, Jie Yang, Ji-Yong Liu, Chen-Chao Xu, Wu-Zhang Yang, Wan-Li Chai, Jia-Yi Lu, Chang-Chao Liu, Bo-Sen Wang, Hao Jiang, Qian Tao, Zhi Ren, Xiao-Feng Xu, Chao Cao, Zhu-An Xu, Rui Zhou, **-Guang Cheng, Guang-Han Cao

    Abstract: Unconventional superconductivity (USC) in a highly correlated kagome system has been theoretically proposed for years, yet the experimental realization is hard to achieve. The recently discovered vanadium-based kagome materials, which exhibit both superconductivity and charge density wave (CDW) orders, are nonmagnetic and weakly correlated, thus unlikely host USC as theories proposed. Here we repo… ▽ More

    Submitted 16 March, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: 32 pages, 14 figures

  20. arXiv:2309.03599  [pdf, other

    cs.CV

    Chasing Consistency in Text-to-3D Generation from a Single Image

    Authors: Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

    Abstract: Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the inconsistency issues, including 1) semantic inconsistency, 2) geometric inconsistency, and 3) saturation inconsistency, resulting in distorted, overfitted, and over-saturated generations. In light of the above issues, we p… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 9 pages, 11 figures

  21. arXiv:2308.09953  [pdf, other

    cs.CV

    UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

    Authors: Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Animal visual perception is an important technique for automatically monitoring animal health, understanding animal behaviors, and assisting animal-related research. However, it is challenging to design a deep learning-based perception model that can freely adapt to different animals across various perception tasks, due to the varying poses of a large diversity of animals, lacking data on rare spe… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

  22. arXiv:2308.09678  [pdf, other

    cs.CV cs.AI cs.MM cs.RO

    PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

    Authors: Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

    Abstract: Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to… ▽ More

    Submitted 16 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the code is at https://github.com/hbing-l/PoSynDA

  23. arXiv:2308.09592  [pdf, other

    cs.CV

    StableVideo: Text-driven Consistency-aware Diffusion Video Editing

    Authors: Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

    Abstract: Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing in practical scenarios. In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  24. arXiv:2308.01555  [pdf, other

    cs.RO

    Mani-GPT: A Generative Model for Interactive Robotic Manipulation

    Authors: Zhe Zhang, Wei Chai, Jiankun Wang

    Abstract: In real-world scenarios, human dialogues are multi-round and diverse. Furthermore, human instructions can be unclear and human responses are unrestricted. Interactive robots face difficulties in understanding human intents and generating suitable strategies for assisting individuals through manipulation. In this article, we propose Mani-GPT, a Generative Pre-trained Transformer (GPT) for interacti… ▽ More

    Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  25. arXiv:2308.01164  [pdf, other

    cs.RO

    Virtual Reality Based Robot Teleoperation via Human-Scene Interaction

    Authors: Lingxiao Meng, Jiangshan Liu, Wei Chai, Jiankun Wang, Max Q. -H. Meng

    Abstract: Robot teleoperation gains great success in various situations, including chemical pollution rescue, disaster relief, and long-distance manipulation. In this article, we propose a virtual reality (VR) based robot teleoperation system to achieve more efficient and natural interaction with humans in different scenes. A user-friendly VR interface is designed to help users interact with a desktop scene… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  26. arXiv:2307.16449  [pdf, other

    cs.CV

    MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

    Authors: Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks. Yet, existing systems can only handle videos with very few frames. For long videos, the computation complexity, memory cost, and long-term temporal connection impose additional challenges. Taking advantage of the Atkinson-S… ▽ More

    Submitted 9 March, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: CVPR 2024. First three authors contribute equally to this work. Project Website https://rese1f.github.io/MovieChat/

  27. arXiv:2307.07075  [pdf, ps, other

    cs.IT

    Adaptive Coding and Modulation Aided Mobile Relaying for Millimeter-Wave Flying Ad-Hoc Networks

    Authors: Jiankang Zhang, Sheng Chen, Wei Koong Chai, Lajos Hanzo

    Abstract: The emerging drone swarms are capable of carrying out sophisticated tasks in support of demanding Internet-of-Things (IoT) applications by synergistically working together. However, the target area may be out of the coverage of the ground station and it may be impractical to deploy a large number of drones in the target area due to cost, electromagnetic interference and flight-safety regulations.… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  28. arXiv:2307.03833  [pdf, other

    cs.CV cs.AI

    Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

    Authors: Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

    Abstract: Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic… ▽ More

    Submitted 24 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: WACV 2024

  29. arXiv:2307.03353  [pdf, other

    cs.CV

    A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

    Authors: Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sp… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  30. arXiv:2306.17201  [pdf, other

    cs.CV

    MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

    Authors: Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Estimating 3D human poses only from a 2D human pose sequence is thoroughly explored in recent years. Yet, prior to this, no such work has attempted to unify 2D and 3D pose representations in the shared feature space. In this paper, we propose MPM, a unified 2D-3D human pose representation framework via masked pose modeling. We treat 2D and 3D poses as two different modalities like vision and langu… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Codes and model checkpoints are available at https://github.com/vvirgooo2/MPM

  31. arXiv:2305.08824  [pdf, other

    cs.CV

    Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

    Authors: **gxia Jiang, Tian Ye, **bin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, Erkang Chen

    Abstract: A lightweight underwater image enhancement network is of great significance for resource-constrained platforms, but balancing model size, computational efficiency, and enhancement performance has proven difficult for previous approaches. In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  32. arXiv:2303.16456  [pdf, other

    cs.CV

    Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

    Authors: Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: When applying a pre-trained 2D-to-3D human pose lifting model to a target unseen dataset, large performance degradation is commonly encountered due to domain shift issues. We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient di… ▽ More

    Submitted 17 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  33. arXiv:2303.15124  [pdf, other

    cs.CV cs.LG eess.IV

    Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

    Authors: Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

    Abstract: Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  34. arXiv:2303.00313  [pdf, other

    cs.LG q-bio.BM

    Deep Learning Methods for Small Molecule Drug Discovery: A Survey

    Authors: Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

    Abstract: With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade. Various applications of deep learning have drawn great attention in drug discovery, such as molecule generation, molecular property prediction, retrosynthesis prediction, and reaction prediction. While most existing s… ▽ More

    Submitted 5 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  35. arXiv:2302.06826  [pdf, other

    cs.CV

    DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

    Authors: Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

    Abstract: Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image. It is a challenging task since there are no reference images available for the newly designed output fashion images. Although diffusi… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  36. arXiv:2210.14426  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall physics.app-ph physics.chem-ph

    Liquid Metal Printed Ultrathin Oxides for Monolayer WS2 Top-Gate Transistors

    Authors: Yiyu Zhang, Dasari Venkatakrishnarao, Michel Bosman, Wei Fu, Sarthak Das, Fabio Bussolotti, Rainer Lee, Siew Lang Teo, Ding Huang, Ivan Verzhbitskiy, Zhuojun Jiang, Zhuoling Jiang, Jian Wei Chai, Shi Wun Tong, Zi-En Ooi, Calvin Pei Yu Wong, Yee Sin Ang, Kuan Eng Johnson Goh, Chit Siong Lau

    Abstract: Two-dimensional (2D) semiconductors are promising channel materials for continued downscaling of complementary metal-oxide-semiconductor (CMOS) logic circuits. However, their full potential continues to be limited by a lack of scalable high-k dielectrics that can achieve atomically smooth interfaces, small equivalent oxide thicknesses (EOT), excellent gate control, and low leakage currents. Here,… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  37. arXiv:2209.11477  [pdf, other

    cs.CV

    Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

    Authors: Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, Volodymyr Kindratenko

    Abstract: Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media. Previous work has largely relied on action recognition techniques to tackle this problem. In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware f… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Accepted by ICTAI 2022

  38. arXiv:2207.03586  [pdf, other

    cs.LG cs.AI cs.RO

    CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships

    Authors: Rebecca Roelofs, Liting Sun, Ben Caine, Khaled S. Refaat, Ben Sapp, Scott Ettinger, Wei Chai

    Abstract: As machine learning models become increasingly prevalent in motion forecasting for autonomous vehicles (AVs), it is critical to ensure that model predictions are safe and reliable. However, exhaustively collecting and labeling the data necessary to fully test the long tail of rare and challenging scenarios is difficult and expensive. In this work, we construct a new benchmark for evaluating and im… ▽ More

    Submitted 6 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Rebecca Roelofs and Liting Sun are equally contributed to the work

  39. arXiv:2111.09515  [pdf, other

    cs.CV

    Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Point Density Level Estimation

    Authors: Yantao Lu, Xuetao Hao, Yilan Li, Weiheng Chai, Shiqi Sun, Senem Velipasalar

    Abstract: 3D object detection from LiDAR data for autonomous driving has been making remarkable strides in recent years. Among the state-of-the-art methodologies, encoding point clouds into a bird's eye view (BEV) has been demonstrated to be both effective and efficient. Different from perspective views, BEV preserves rich spatial and distance information between objects. Yet, while farther objects of the s… ▽ More

    Submitted 8 August, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  40. arXiv:1911.11616  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    Enhancing Cross-task Black-Box Transferability of Adversarial Examples with Dispersion Reduction

    Authors: Yantao Lu, Yunhan Jia, Jianyu Wang, Bai Li, Weiheng Chai, Lawrence Carin, Senem Velipasalar

    Abstract: Neural networks are known to be vulnerable to carefully crafted adversarial examples, and these malicious samples often transfer, i.e., they remain adversarial even against other models. Although great efforts have been delved into the transferability across models, surprisingly, less attention has been paid to the cross-task transferability, which represents the real-world cybercriminal's situati… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.03333

  41. arXiv:1805.11761  [pdf, other

    stat.ML cs.CV cs.LG

    Collaborative Learning for Deep Neural Networks

    Authors: Guocong Song, Wei Chai

    Abstract: We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning.… ▽ More

    Submitted 6 November, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: To appear in NIPS 2018

  42. arXiv:1609.09165  [pdf, ps, other

    nucl-ex astro-ph.HE astro-ph.SR

    Reevaluation of thermonuclear reaction rate of 50Fe(p,gamma)51Co

    Authors: L. P. Zhang, J. J. He, W. D. Chai, S. Q. Hou, L. Y. Zhang

    Abstract: The thermonuclear rate of the 50Fe(p,gamma)51Co reaction in the Type I X-ray bursts (XRBs) temperature range has been reevaluated based on a recent precise mass measurement at CSRe lanzhou, where the proton separation energy Sp=142+/-77 keV has been determined firstly for the 51Co nucleus. Comparing to the previous theoretical predictions, the experimental Sp value has much smaller uncertainty. Ba… ▽ More

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: 7 pages, 2 figures and 5 tables

  43. arXiv:1606.07792  [pdf, other

    cs.LG cs.IR stat.ML

    Wide & Deep Learning for Recommender Systems

    Authors: Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah

    Abstract: Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks… ▽ More

    Submitted 24 June, 2016; originally announced June 2016.

  44. Injection method of barrier bucket supported by off-aligned electron cooling for CRing of HIAF

    Authors: Guo-Dong Shen, Jian-Cheng Yang, Jia-Wen Xia, Li-Jun Mao, Da-Yu Yin, Wei-** Chai, Jian Shi, Li-Na Sheng, A. Smirnov, Bo Wu, He Zhao

    Abstract: A new accelerator complex, HIAF (the High Intensity Heavy Ion Accelerator Facility), has been approved in China. It is designed to provide intense primary and radioactive ion beams for research in high energy density physics, nuclear physics, atomic physics as well as other applications. In order to achieve a high intensity of up to 5e11 ppp 238U34+, the Compression Ring (CRing) needs to stack mor… ▽ More

    Submitted 28 March, 2016; v1 submitted 4 January, 2016; originally announced January 2016.

  45. arXiv:1507.03224  [pdf

    physics.acc-ph hep-ex

    Concept for a Future Super Proton-Proton Collider

    Authors: **gyu Tang, J. Scott Berg, Wei** Chai, Fusan Chen, Nian Chen, Weiren Chou, Haiyi Dong, Jie Gao, Tao Han, Yongbin Leng, Guangrui Li, Ramesh Gupta, Peng Li, Zhihui Li, Baiqi Liu, Yudong Liu, Xinchou Lou, Qing Luo, Ernie Malamud, Lijun Mao, Robert B. Palmer, Quanling Peng, Yuemei Peng, Manqi Ruan, GianLuca Sabbi , et al. (26 additional authors not shown)

    Abstract: Following the discovery of the Higgs boson at LHC, new large colliders are being studied by the international high-energy community to explore Higgs physics in detail and new physics beyond the Standard Model. In China, a two-stage circular collider project CEPC-SPPC is proposed, with the first stage CEPC (Circular Electron Positron Collier, a so-called Higgs factory) focused on Higgs physics, and… ▽ More

    Submitted 19 July, 2015; v1 submitted 12 July, 2015; originally announced July 2015.

    Comments: 34 pages, 8 figures, 5 tables

  46. arXiv:1305.4997  [pdf

    physics.acc-ph nucl-ex

    The SHER-HIAF Ring Lattice Design

    Authors: X. Gao, J. C. Yang, J. W. Xia, W. P. Chai, J. Shi, G. D. Shen

    Abstract: Super Heavy Experimental Ring (SHER) is one of the rings of the next accelerator complex High Intensity Heavy Ion Accelerator Facility (HIAF) at IMP[4]. Here, present ideas of the lattice design for the operation of the large acceptance ring are presented. The SHER ring has to be optimized for e-cooling and the lattice is designed for different modes. First of all, it is designed in the so called… ▽ More

    Submitted 21 May, 2013; originally announced May 2013.

  47. arXiv:1212.0365  [pdf

    cs.CY

    Design and Implementation of Flight Visual Simulation System

    Authors: Feng Tian, Wenjian Chai, Chuanyun Wang, ** Sun

    Abstract: The design requirement for flight visual simulation system is studied and the overall structure and development process are proposed in this paper. Through the construction of 3D scene model library and aircraft model, the rendering and interaction of visual scene are implemented. The changes of aircraft flight attitude in visual system are controlled by real-time calculation of aircraft aerodynam… ▽ More

    Submitted 3 December, 2012; originally announced December 2012.