Skip to main content

Showing 1–50 of 165 results for author: Gao, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18543  [pdf, ps, other

    cs.CV

    A Set-based Approach for Feature Extraction of 3D CAD Models

    Authors: Peng Xu, Qi Gao, Ying-Jie Wu

    Abstract: Feature extraction is a critical technology to realize the automatic transmission of feature information throughout product life cycles. As CAD models primarily capture the 3D geometry of products, feature extraction heavily relies on geometric information. However, existing feature extraction methods often yield inaccurate outcomes due to the diverse interpretations of geometric information. This… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2406.16021  [pdf, other

    cs.CL cs.AI

    Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

    Authors: Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document ev… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL2024(Findings)

  3. arXiv:2406.15990  [pdf, other

    cs.CL cs.AI

    Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

    Authors: Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performan… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Report number: https://aclanthology.org/2024.lrec-main.523/

    Journal ref: LREC|COLING,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation,2024,5907-5921

  4. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  6. arXiv:2406.09622  [pdf, other

    cs.CV cs.AI eess.IV

    DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

    Authors: Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024, Project Page: https://dsl-fiqa.github.io/

  7. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  8. arXiv:2405.20786  [pdf, other

    cs.CV cs.HC

    Stratified Avatar Generation from Sparse Observations

    Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu

    Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  9. arXiv:2405.09814  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

    Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

    Abstract: In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024 (Journal Track); Project page: https://pku-mocca.github.io/Semantic-Gesticulator-Page

  10. arXiv:2405.01066  [pdf, other

    cs.CV cs.AI cs.HC

    HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

    Authors: Zixun Jiao, Xihan Wang, Zhaoqiang Xia, Lianhe Shao, Quanli Gao

    Abstract: Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high comput… ▽ More

    Submitted 14 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  11. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  12. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  13. arXiv:2404.05221  [pdf, other

    cs.CL cs.AI

    LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

    Authors: Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

    Abstract: Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on develo** advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Project website: https://www.llm-reasoners.net/

  14. arXiv:2404.04953  [pdf, other

    cs.CV

    High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning

    Authors: Yu Lei, Guoshuai Sheng, Fangfang Li, Quanxue Gao, Cheng Deng, Qin Li

    Abstract: Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  15. arXiv:2404.04940  [pdf, other

    cs.LG

    Fuzzy K-Means Clustering without Cluster Centroids

    Authors: Han Lu, Fangfang Li, Quanxue Gao, Cheng Deng, Chris Ding, Qianqian Wang

    Abstract: Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  16. arXiv:2404.01168  [pdf, other

    cs.CV cs.GR

    Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

    Authors: Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma

    Abstract: 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that phy… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 22 pages, 7 figures

  17. arXiv:2404.00883  [pdf, other

    cs.LG

    Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization

    Authors: **g Li, Quanxue Gao, Cheng Deng, Qianqian Wang, Ming Yang

    Abstract: The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, hel** avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To addre… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  18. arXiv:2403.15803  [pdf, other

    eess.IV cs.CV

    Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations

    Authors: Ruige Zong, Tao Wang, Chunwang Li, Xinlin Zhang, Yuanbin Chen, Longxuan Zhao, Qixuan Li, Qinquan Gao, Dezhi Kang, Fuxin Lin, Tong Tong

    Abstract: Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  19. arXiv:2403.12365  [pdf, other

    cs.CV

    GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

    Authors: Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann

    Abstract: Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  20. arXiv:2403.11427  [pdf, other

    cs.CV

    BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors

    Authors: Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen

    Abstract: Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for train… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: https://talegqz.github.io/BAGS/

  21. arXiv:2403.06814  [pdf, other

    cs.LG q-bio.NC

    ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

    Authors: Hao-Lun Hsu, Qitong Gao, Miroslav Pajic

    Abstract: Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment.… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages, 12 figures, 2 tables. To appear in the 15th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS'2024)

  22. arXiv:2403.03035  [pdf, other

    cs.PL

    Mars 2.0: A Toolchain for Modeling, Analysis, Verification and Code Generation of Cyber-Physical Systems

    Authors: Bohua Zhan, Xiong Xu, Qiang Gao, Zekun Ji, Xiangyu **, Shuling Wang, Naijun Zhan

    Abstract: We introduce Mars 2.0 for modeling, analysis, verification and code generation of Cyber-Physical Systems. Mars 2.0 integrates Mars 1.0 with several important extensions and improvements, allowing the design of cyber-physical systems using the combination of AADL and Simulink/Stateflow, which provide a unified graphical framework for modeling the functionality, physicality and architecture of the s… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  23. arXiv:2403.01460  [pdf, other

    cs.LG

    One-Step Multi-View Clustering Based on Transition Probability

    Authors: Wenhui Zhao, Quanxue Gao, Guangfei Li, Cheng Deng, Ming Yang

    Abstract: The large-scale multi-view clustering algorithms, based on the anchor graph, have shown promising performance and efficiency and have been extensively explored in recent years. Despite their successes, current methods lack interpretability in the clustering process and do not sufficiently consider the complementary information across different views. To address these shortcomings, we introduce the… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 8 pages

  24. arXiv:2402.16846  [pdf, other

    cs.CV cs.AI cs.CL

    GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

    Authors: Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

    Abstract: Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Lang… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Website: https://groundhog-mllm.github.io/

  25. arXiv:2402.16544  [pdf, other

    cs.LG

    Label Learning Method Based on Tensor Projection

    Authors: **g Li, Quanxue Gao, Qianqian Wang, Cheng Deng, Deyan Xie

    Abstract: Multi-view clustering method based on anchor graph has been widely concerned due to its high efficiency and effectiveness. In order to avoid post-processing, most of the existing anchor graph-based methods learn bipartite graphs with connected components. However, such methods have high requirements on parameters, and in some cases it may not be possible to obtain bipartite graphs with clear conne… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  26. arXiv:2402.15688  [pdf, other

    cs.LG

    Anchor-free Clustering based on Anchor Graph Factorization

    Authors: Shikun Mei, Fangfang Li, Quanxue Gao, Ming Yang

    Abstract: Anchor-based methods are a pivotal approach in handling clustering of large-scale data. However, these methods typically entail two distinct stages: selecting anchor points and constructing an anchor graph. This bifurcation, along with the initialization of anchor points, significantly influences the overall performance of the algorithm. To mitigate these issues, we introduce a novel method termed… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  27. arXiv:2402.15674  [pdf, other

    cs.PL

    Formally Verified C Code Generation from Hybrid Communicating Sequential Processes

    Authors: Shuling Wang, Zekun Ji, Bohua Zhan, Xiong Xu, Qiang Gao, Naijun Zhan

    Abstract: Hybrid Communicating Sequential Processes (HCSP) is a formal model for hybrid systems, including primitives for evolution along an ordinary differential equation (ODE), communication, and parallel composition. Code generation is needed to convert HCSP models into code that can be executed in practice, and the correctness of this conversion is essential to ensure that the generated code accurately… ▽ More

    Submitted 26 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  28. arXiv:2401.16291  [pdf, other

    cs.LG cs.CY

    MachineLearnAthon: An Action-Oriented Machine Learning Didactic Concept

    Authors: Michal Tkáč, Jakub Sieber, Lara Kuhlmann, Matthias Brueggenolte, Alexandru Rinciog, Michael Henke, Artur M. Schweidtmann, Qinghe Gao, Maximilian F. Theisen, Radwa El Shawi

    Abstract: Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside f… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  29. arXiv:2401.15663  [pdf, other

    eess.IV cs.CV

    Low-resolution Prior Equilibrium Network for CT Reconstruction

    Authors: Yijie Yang, Qifeng Gao, Yu** Duan

    Abstract: The unrolling method has been investigated for learning variational models in X-ray computed tomography. However, it has been observed that directly unrolling the regularization model through gradient descent does not produce satisfactory results. In this paper, we present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regul… ▽ More

    Submitted 18 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  30. arXiv:2401.05440  [pdf, other

    eess.SP cs.HC cs.LG

    Autosen: improving automatic wifi human sensing through cross-modal autoencoder

    Authors: Qian Gao, Yanling Hao, Yuanwei Liu

    Abstract: WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities. However, its effectiveness is largely confined to controlled, single-user, line-of-sight settings, limited by data collection complexities and the scarcity of labeled datasets. Traditional cross-modal methods, aimed at mitigating these limitations by enabling self-supervised learning with… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  31. arXiv:2401.03914  [pdf, other

    cs.CV

    D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement

    Authors: Danqi Yan, Qing Gao, Yuepeng Qian, Xinxing Chen, Chenglong Fu, Yuquan Leng

    Abstract: Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise.… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  32. arXiv:2401.00499  [pdf

    physics.chem-ph cond-mat.soft cs.AI

    Generating High-Precision Force Fields for Molecular Dynamics Simulations to Study Chemical Reaction Mechanisms using Molecular Configuration Transformer

    Authors: Sihao Yuan, Xu Han, Jun Zhang, Zhaoxin Xie, Cheng Fan, Yunlong Xiao, Yi Qin Gao, Yi Isaac Yang

    Abstract: Theoretical studies on chemical reaction mechanisms have been crucial in organic chemistry. Traditionally, calculating the manually constructed molecular conformations of transition states for chemical reactions using quantum chemical calculations is the most commonly used method. However, this way is heavily dependent on individual experience and chemical intuition. In our previous study, we prop… ▽ More

    Submitted 11 April, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  33. arXiv:2312.17538  [pdf, other

    cs.CV cs.LG eess.IV

    Distance Guided Generative Adversarial Network for Explainable Binary Classifications

    Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan

    Abstract: Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 12 pages, 8 figures. This work has been submitted to the IEEE TNNLS for possible publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media

  34. arXiv:2312.11451  [pdf, other

    cs.CV cs.RO

    Language-Assisted 3D Scene Understanding

    Authors: Yanmin Wu, Qiankun Gao, Renrui Zhang, Jian Zhang

    Abstract: The scale and quality of point cloud datasets constrain the advancement of point cloud learning. Recently, with the development of multi-modal learning, the incorporation of domain-agnostic prior knowledge from other modalities, such as images and text, to assist in point cloud feature learning has been considered a promising avenue. Existing methods have demonstrated the effectiveness of multi-mo… ▽ More

    Submitted 31 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Technical report, unpublished, 16 pages

  35. arXiv:2312.07934  [pdf, other

    eess.IV cs.CV

    Toward Real World Stereo Image Super-Resolution via Hybrid Degradation Model and Discriminator for Implied Stereo Image Information

    Authors: Yuanbo Zhou, Yuyang Xue, Jiang Bi, Wenlin He, Xinlin Zhang, Jiajun Zhang, Wei Deng, Ruofeng Nie, Junlin Lan, Qinquan Gao, Tong Tong

    Abstract: Real-world stereo image super-resolution has a significant influence on enhancing the performance of computer vision systems. Although existing methods for single-image super-resolution can be applied to improve stereo images, these methods often introduce notable modifications to the inherent disparity, resulting in a loss in the consistency of disparity between the original and the enhanced ster… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  36. arXiv:2311.14388  [pdf, other

    cs.CV cs.LG

    A Parameterized Generative Adversarial Network Using Cyclic Projection for Explainable Medical Image Classification

    Authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, Chan-Tong Lam, Tong Tong, Hao Chen, Qinquan Gao, Wei Ke, Tao Tan

    Abstract: Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among do… ▽ More

    Submitted 14 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures. This work has been submitted to the IEEE ICASSP for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  37. arXiv:2311.12052  [pdf, other

    cs.CV

    MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

    Authors: Di Chang, Yichun Shi, Quankai Gao, Jessica Fu, Hongyi Xu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, Mohammad Soleymani

    Abstract: In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person's new images by controlling the poses and facial expressions while kee** the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressio… ▽ More

    Submitted 5 May, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: Accepted by ICML 2024. MagicPose and MagicDance are the same project. Website:https://boese0601.github.io/magicdance/ Code:https://github.com/Boese0601/MagicDance

  38. arXiv:2311.10349  [pdf, other

    eess.IV cs.CV cs.LG

    Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation

    Authors: Tao Wang, Yuanbin Chen, Xinlin Zhang, Yuanbo Zhou, Junlin Lan, Bizhe Bai, Tao Tan, Min Du, Qinquan Gao, Tong Tong

    Abstract: Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose t… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  39. arXiv:2310.17688  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Managing extreme AI risks amid rapid progress

    Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

    Abstract: Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to develo** generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although rese… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Published in Science: https://www.science.org/doi/10.1126/science.adn0117

  40. arXiv:2310.09676  [pdf, other

    cs.RO cs.AI

    Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

    Authors: Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang

    Abstract: Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning. In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text de… ▽ More

    Submitted 27 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted by ICML 2024. Project page: https://midas-icml.github.io

  41. arXiv:2310.07123  [pdf, other

    cs.LG cs.AI

    Off-Policy Evaluation for Human Feedback

    Authors: Qitong Gao, Ge Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic

    Abstract: Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare.… ▽ More

    Submitted 14 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  42. arXiv:2309.14592  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Post-training Quantization with FP8 Formats

    Authors: Haihao Shen, Naveen Mellempudi, Xin He, Qun Gao, Chang Wang, Mengni Wang

    Abstract: Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for improved quantization methods that can meet the computational demands of these modern architectures while maintaining accuracy. Towards this goal, we study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures covering a wide range of tasks, includin… ▽ More

    Submitted 31 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  43. arXiv:2309.13516  [pdf, other

    cs.CV cs.RO

    InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation

    Authors: Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, **g-Wen Chen, Ulrich Neumann

    Abstract: Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world… ▽ More

    Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Add Depth-Anything

  44. arXiv:2309.00642  [pdf, other

    cs.CL

    Extracting Mathematical Concepts with Large Language Models

    Authors: Valeria de Paiva, Qiyue Gao, Pavel Kovalev, Lawrence S. Moss

    Abstract: We extract mathematical concepts from mathematical text using generative large language models (LLMs) like ChatGPT, contributing to the field of automatic term extraction (ATE) and mathematical text processing, and also to the study of LLMs themselves. Our work builds on that of others in that we aim for automatic extraction of terms (keywords) in one mathematical field, category theory, using as… ▽ More

    Submitted 29 August, 2023; originally announced September 2023.

    Comments: 13 pages, 4 figures, presented to the 14th MathUI Workshop 2023

    MSC Class: 68T50 ACM Class: I.2.7

  45. arXiv:2308.16785  [pdf

    cs.AI cs.HC

    Agent Teaming Situation Awareness (ATSA): A Situation Awareness Framework for Human-AI Teaming

    Authors: Qi Gao, Wei Xu, Mowei Shen, Zaifeng Gao

    Abstract: The rapid advancements in artificial intelligence (AI) have led to a growing trend of human-AI teaming (HAT) in various fields. As machines continue to evolve from mere automation to a state of autonomy, they are increasingly exhibiting unexpected behaviors and human-like cognitive/intelligent capabilities, including situation awareness (SA). This shift has the potential to enhance the performance… ▽ More

    Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 52 pages,5 figures, 1 table

  46. arXiv:2308.07822  [pdf, other

    cs.LG eess.SY

    Deep reinforcement learning for process design: Review and perspective

    Authors: Qinghe Gao, Artur M. Schweidtmann

    Abstract: The transformation towards renewable energy and feedstock supply in the chemical industry requires new conceptual process design approaches. Recently, breakthroughs in artificial intelligence offer opportunities to accelerate this transition. Specifically, deep reinforcement learning, a subclass of machine learning, has shown the potential to solve complex decision-making problems and aid sustaina… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  47. arXiv:2308.05221  [pdf, other

    cs.HC cs.AI cs.RO

    Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

    Authors: Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai , et al. (17 additional authors not shown)

    Abstract: The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented wi… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  48. arXiv:2308.03882  [pdf, other

    cs.LG cs.AI stat.ML

    Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

    Authors: Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme

    Abstract: Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen st… ▽ More

    Submitted 24 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  49. arXiv:2308.02621  [pdf, other

    cs.CV cs.LG

    Color Image Recovery Using Generalized Matrix Completion over Higher-Order Finite Dimensional Algebra

    Authors: Liang Liao, Zhuang Guo, Qi Gao, Yan Wang, Fajun Yu, Qifeng Zhao, Stephen Johh Maybank

    Abstract: To improve the accuracy of color image completion with missing entries, we present a recovery method based on generalized higher-order scalars. We extend the traditional second-order matrix model to a more comprehensive higher-order matrix equivalent, called the "t-matrix" model, which incorporates a pixel neighborhood expansion strategy to characterize the local pixel constraints. This "t-matrix"… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 24 pages; 9 figures

  50. arXiv:2308.00937  [pdf, other

    cs.RO cs.AI cs.MA

    LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

    Authors: Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme

    Abstract: Complex manipulation tasks often require robots with complementary capabilities to collaborate. We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting. LEMMA features 8 types of procedurally generated tasks with varying degree of complexity, some of… ▽ More

    Submitted 16 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 8 pages, 3 figures, accepted by RA-L

    Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6835-6842, Oct. 2023