Skip to main content

Showing 1–50 of 673 results for author: Cheng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19680  [pdf, other

    cs.CV cs.AI cs.MM

    MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

    Authors: Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

    Abstract: In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18598  [pdf, other

    eess.SP cs.IT

    CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking

    Authors: Hossein Safi, Mohammad Taghi Dabiri, Julian Cheng, Iman Tavakkolnia, Harald Haas

    Abstract: The integration of CubeSats with Free Space Optical (FSO) links accelerates a major advancement in high-throughput, low-Earth orbit communication systems. However, CubeSats face challenges such as size, weight, and power (SWaP) limitations, as well as vibrations that cause fluctuations in the angle-of-arrival (AoA) of the optical beam at the receiver. These practical challenges make establishing C… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  3. arXiv:2406.18152  [pdf, other

    cs.MA

    Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning

    Authors: Junkai Zhang, Yifan Zhang, Xi Sheryl Zhang, Yifan Zang, Jian Cheng

    Abstract: Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE's training efficiency, requiring a large number of training samples to achieve a unified consensus on agents' policies. This divergence stems from the lack of a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The AAAI-2024 paper with the appendix

  4. arXiv:2406.16714  [pdf, other

    cs.CL cs.AI cs.LG

    AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

    Authors: Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

    Abstract: Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences in practical deployments, it is crucial to investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16299  [pdf, other

    cs.CL cs.AI

    Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

    Authors: Yifei Gao, Jie Ou, Lei Wang, Yuting Xiao, Zhiyuan Xiang, Ruiting Dai, Jun Cheng

    Abstract: Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization metho… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Efficient quantization method

    MSC Class: F.2.3

  6. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2406.14796  [pdf, other

    cs.LG cs.AI

    MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning

    Authors: Jiali Cheng, Hadi Amiri

    Abstract: Recent advancements in Machine Unlearning (MU) have introduced solutions to selectively remove certain training samples, such as those with outdated or sensitive information, from trained models. Despite these advancements, evaluation of MU methods have been inconsistent, employing different trained models and architectures, and sample removal strategies, which hampers accurate comparison. In addi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.14098  [pdf, ps, other

    cs.CV

    HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

    Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni

    Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  9. arXiv:2406.14021  [pdf, other

    cs.CL cs.LG q-bio.QM

    HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment

    Authors: Yongqiang Chen, Quanming Yao, Juzheng Zhang, James Cheng, Yatao Bian

    Abstract: Recently there has been a surge of interest in extending the success of large language models (LLMs) to graph modality, such as social networks and molecules. As LLMs are predominantly trained with 1D text data, most existing approaches adopt a graph neural network to represent a graph as a series of node tokens and feed these tokens to LLMs for graph-language alignment. Despite achieving some suc… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preliminary version of an ongoing project: https://higraphllm.github.io/

  10. arXiv:2406.13864  [pdf, other

    cs.LG q-bio.BM

    Evaluating representation learning on the protein structure universe

    Authors: Arian R. Jamasb, Alex Morehead, Chaitanya K. Joshi, Zuobai Zhang, Kieran Didi, Simon V. Mathis, Charles Harris, Jian Tang, Jianlin Cheng, Pietro Lio, Tom L. Blundell

    Abstract: We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relations… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICLR 2024

  11. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  13. arXiv:2406.10175  [pdf, other

    cs.CV

    Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

    Authors: Weide Liu, **gwen Hou, Xiaoyang Zhong, Hui**g Zhan, Jun Cheng, Yuming Fang, Guanghui Yue

    Abstract: Deep learning-based brain tumor segmentation (BTS) models for multi-modal MRI images have seen significant advancements in recent years. However, a common problem in practice is the unavailability of some modalities due to varying scanning protocols and patient conditions, making segmentation from incomplete MRI modalities a challenging issue. Previous methods have attempted to address this by fus… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  14. arXiv:2406.08634  [pdf, other

    eess.IV cs.CV cs.LG

    Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning

    Authors: Zhongao Sun, Jiameng Li, Yuhan Wang, Jiarong Cheng, Qing Zhou, Chun Li

    Abstract: Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) where missing modality images are common in clinical settings, leading to reduced segmentation accuracy. To address this issue, we propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2406.07955  [pdf, other

    cs.LG stat.ML

    How Interpretable Are Interpretable Graph Neural Networks?

    Authors: Yongqiang Chen, Yatao Bian, Bo Han, James Cheng

    Abstract: Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: ICML2024, 44 pages, 21 figures, 12 tables

  16. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kai**g Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos are critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets for surgical workflow analysis, which typically face challenges such as small s… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Version 1

  17. arXiv:2406.07177  [pdf, other

    cs.LG

    TernaryLLM: Ternarized Large Language Model

    Authors: Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad Barsoum, Peisong Wang, Jian Cheng

    Abstract: Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming fr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  18. arXiv:2406.05654  [pdf, other

    cs.CL cs.IR

    DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

    Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in kee** up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  19. arXiv:2406.05320  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks are Adaptive to Function Regularity and Data Distribution in Approximation and Estimation

    Authors: Hao Liu, Jiahui Cheng, Wen**g Liao

    Abstract: Deep learning has exhibited remarkable results across diverse areas. To understand its success, substantial research has been directed towards its theoretical foundations. Nevertheless, the majority of these studies examine how well deep neural networks can model functions with uniform regularity. In this paper, we explore a different angle: how deep neural networks can adapt to different regulari… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  20. arXiv:2406.04451  [pdf, other

    cs.RO

    RiskMap: A Unified Driving Context Representation for Autonomous Motion Planning in Urban Driving Environment

    Authors: Ren Xin, Sheng Wang, Yingbing Chen, Jie Cheng, Ming Liu

    Abstract: Planning is complicated by the combination of perception and map information, particularly when driving in heavy traffic. Develo** an extendable and efficient representation that visualizes sensor noise and provides constraints to real-time planning tasks is desirable. We aim to develop an extendable map representation offering prior to cost in planning tasks to simplify the planning process of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Submission to ICRA 2023 was not accepted. This paper is now available just for public reference

  21. arXiv:2406.03944  [pdf, other

    cs.LG

    Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

    Authors: Dake Bu, Wei Huang, Taiji Suzuki, Ji Cheng, Qingfu Zhang, Zhiqiang Xu, Hau-San Wong

    Abstract: Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by the 41th Intemational Conference on Machine Learning (lCML 2024)

  22. arXiv:2406.03088  [pdf, other

    cs.AR cs.LG

    HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

    Authors: Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao

    Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted to FPL2024

  23. arXiv:2406.01843  [pdf, other

    cs.CV

    L-MAGIC: Language Model Assisted Generation of Images with Coherence

    Authors: Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch

    Abstract: In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: accepted to CVPR 2024

  24. arXiv:2406.01388  [pdf, other

    cs.CV

    AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

    Authors: Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subject… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Multi-turn interactive image generation

  25. arXiv:2405.17579  [pdf, other

    cs.RO

    Harnessing Natural Oscillations for High-Speed, Efficient Asymmetrical Locomotion in Quadrupedal Robots

    Authors: **g Cheng, Yasser G. Alqaham, Zhenyu Gan

    Abstract: This study explores the dynamics of asymmetrical bounding gaits in quadrupedal robots, focusing on the integration of torso pitching and hip motion to enhance speed and stability. Traditional control strategies often enforce a fixed posture, minimizing natural body movements to simplify the control problem. However, this approach may overlook the inherent dynamical advantages found in natural loco… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  26. arXiv:2405.16099  [pdf, other

    cs.CV

    Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation

    Authors: Huizhou Chen, Jiangyi Wang, Yuxin Li, Na Zhao, Jun Cheng, Xulei Yang

    Abstract: 3D environment recognition is essential for autonomous driving systems, as autonomous vehicles require a comprehensive understanding of surrounding scenes. Recently, the predominant approach to define this real-life problem is through 3D occupancy prediction. It attempts to predict the occupancy states and semantic labels for all voxels in 3D space, which enhances the perception capability. Birds-… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, accepted by IEEE CAI 2024

  27. arXiv:2405.15317  [pdf, other

    cs.LG cs.AI

    NuwaTS: a Foundation Model Mending Every Incomplete Time Series

    Authors: **guo Cheng, Chunwei Yang, Wanlin Cai, Yuxuan Liang, Yuankai Wu

    Abstract: Time series imputation plays a crucial role in various real-world systems and has been extensively explored. Models for time series imputation often require specialization, necessitating distinct designs for different domains and missing patterns. In this study, we introduce NuwaTS, a framework to repurpose Pre-trained Language Model (PLM) for general time series imputation. Once trained, this mod… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures

  28. arXiv:2405.14108  [pdf, other

    cs.LG cs.AI q-bio.BM q-bio.QM

    Deep Learning for Protein-Ligand Docking: Are We There Yet?

    Authors: Alex Morehead, Nabin Giri, Jian Liu, Jianlin Cheng

    Abstract: The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of dockin… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 30 pages, 1 table, 27 figures. Under review. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench

    ACM Class: I.2.1; J.3

  29. arXiv:2405.11809  [pdf, other

    cs.CV cs.AI

    Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices

    Authors: Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng

    Abstract: In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off betw… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: International Conference on Robotics and Automation (ICRA) 2024

  30. arXiv:2405.10885  [pdf, other

    cs.CV

    FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

    Authors: Fei Wang, Jun Cheng

    Abstract: Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose a… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  31. arXiv:2405.10135  [pdf, other

    cs.CE cond-mat.mtrl-sci

    Self-supervised feature distillation and design of experiments for efficient training of micromechanical deep learning surrogates

    Authors: Patxi Fernandez-Zelaia, Jason Mayeur, Jiahao Cheng, Yousub Lee, Kevin Knipe, Kai Kadau

    Abstract: Machine learning surrogate emulators are needed in engineering design and optimization tasks to rapidly emulate computationally expensive physics-based models. In micromechanics problems the local full-field response variables are desired at microstructural length scales. While there has been a great deal of work on establishing architectures for these tasks there has been relatively little work o… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  32. arXiv:2405.09369  [pdf, other

    cs.IR

    Diffusion-based Contrastive Learning for Sequential Recommendation

    Authors: Ziqiang Cui, Haolun Wu, Bowei He, Ji Cheng, Chen Ma

    Abstract: Self-supervised contrastive learning, which directly extracts inherent data correlations from unlabeled data, has been widely utilized to mitigate the data sparsity issue in sequential recommendation. The majority of existing methods create different augmented views of the same user sequence via random augmentation, and subsequently minimize their distance in the embedding space to enhance the qua… ▽ More

    Submitted 7 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  33. arXiv:2405.07966  [pdf, other

    cs.CV cs.AI

    OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition

    Authors: Qiuchi Xiang, **tao Cheng, Jiehao Luo, ** Wu, Rui Fan, Xieyuanli Chen, Xiaoyu Tang

    Abstract: Place recognition is the foundation for enabling autonomous systems to achieve independent decision-making and safe operations. It is also crucial in tasks such as loop closure detection and global localization within SLAM. Previous methods utilize mundane point cloud representations as input and deep learning-based LiDAR-based Place Recognition (LPR) approaches employing different point cloud ima… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  34. arXiv:2405.06828  [pdf, other

    cs.CV

    G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grou**

    Authors: Junfeng Cheng, Tania Stathaki

    Abstract: This paper proposes a novel task named "3D part grou**". Suppose there is a mixed set containing scattered parts from various shapes. This task requires algorithms to find out every possible combination among all the parts. To address this challenge, we propose the so called Gradient Field-based Auto-Regressive Sampling framework (G-FARS) tailored specifically for the 3D part grou** task. In o… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  35. Motivating Users to Attend to Privacy: A Theory-Driven Design Study

    Authors: Varun Shiri, Maggie Xiong, **ghui Cheng, ** L. C. Guo

    Abstract: In modern technology environments, raising users' privacy awareness is crucial. Existing efforts largely focused on privacy policy presentation and failed to systematically address a radical challenge of user motivation for initiating privacy awareness. Leveraging the Protection Motivation Theory (PMT), we proposed design ideas and categories dedicated to motivating users to engage with privacy-re… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 18 pages, 2 figures, DIS 2024

  36. arXiv:2405.03190  [pdf, other

    cs.CV

    Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval

    Authors: Jiacheng Cheng, Hijung Valentina Shin, Nuno Vasconcelos, Bryan Russell, Fabian Caba Heilbron

    Abstract: In the recent years, the dual-encoder vision-language models (\eg CLIP) have achieved remarkable text-to-image retrieval performance. However, we discover that these models usually results in very different retrievals for a pair of paraphrased queries. Such behavior might render the retrieval system less predictable and lead to user frustration. In this work, we consider the task of paraphrased te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  37. arXiv:2405.03159  [pdf, other

    cs.CV

    DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging

    Authors: Wenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, **g Yang, Juan Zou, Ruoyou Wu, Zan Chen, Yuan**g Feng, Hairong Zheng, Shanshan Wang

    Abstract: Deep learning has emerged as a promising approach for learning the nonlinear map** between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  38. arXiv:2405.03141  [pdf, other

    eess.IV cs.AI cs.CV physics.med-ph

    Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

    Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-** Lam, Yong-** Zheng

    Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  39. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  40. arXiv:2404.18669  [pdf, other

    cs.GR cs.AI cs.CV

    Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting

    Authors: Yifei Gao, Jie Ou, Lei Wang, Jun Cheng

    Abstract: Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly de… ▽ More

    Submitted 12 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    MSC Class: I.4.8

  41. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, **zhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  42. arXiv:2404.18089  [pdf, other

    cs.MA

    ATR-Map**: Asymmetric Topological Representation based Map** Framework for Multi-Robot Environment Exploration

    Authors: Hao Zhang, Jiyu Cheng, Wei Zhang

    Abstract: In recent years, the widespread application of multi-robot systems in areas such as power inspection, autonomous vehicle fleets has made multi-robot technology a research hotspot in the field of robotics. This paper investigates multi-robot cooperative exploration in unknown environments, proposing a training framework and decision strategy based on multi-agent reinforcement learning. Specifically… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  43. arXiv:2404.16906  [pdf, other

    cs.NE cs.AI

    Evolve Cost-aware Acquisition Functions Using Large Language Models

    Authors: Yiming Yao, Fei Liu, Ji Cheng, Qingfu Zhang

    Abstract: Many real-world optimization scenarios involve expensive evaluation with unknown and heterogeneous costs. Cost-aware Bayesian optimization stands out as a prominent solution in addressing these challenges. To approach the global optimum within a limited budget in a cost-efficient manner, the design of cost-aware acquisition functions (AFs) becomes a crucial step. However, traditional manual design… ▽ More

    Submitted 13 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  44. arXiv:2404.14327  [pdf, other

    cs.RO

    PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving

    Authors: Jie Cheng, Yingbing Chen, Qifeng Chen

    Abstract: We present PLUTO, a powerful framework that pushes the limit of imitation learning-based planning for autonomous driving. Our improvements stem from three pivotal aspects: a longitudinal-lateral aware model architecture that enables flexible and diverse driving behaviors; An innovative auxiliary loss computation method that is broadly applicable and efficient for batch-wise calculation; A novel tr… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  45. arXiv:2404.13891  [pdf, other

    cs.LG cs.AI cs.GT

    Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

    Authors: Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

    Abstract: Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimisti… ▽ More

    Submitted 14 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  46. arXiv:2404.13767  [pdf, other

    cs.RO cs.AI cs.CV

    Autonomous Robot for Disaster Map** and Victim Localization

    Authors: Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, **gming Cheng

    Abstract: In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtlebot3 with Robotic Operating System (ROS) Noetic. Upon deployment in closed, initially unknown environments, the system aims to generate a comprehensive map and identify any present 'victims' using Apr… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Class final project for Northeastern University EECE 5550 Mobile Robotics Course

  47. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, **tao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

  48. arXiv:2404.12554  [pdf, other

    eess.SY cs.LG

    Learning Stable and Passive Neural Differential Equations

    Authors: **g Cheng, Ruigang Wang, Ian R. Manchester

    Abstract: In this paper, we introduce a novel class of neural differential equation, which are intrinsically Lyapunov stable, exponentially stable or passive. We take a recently proposed Polyak Lojasiewicz network (PLNet) as an Lyapunov function and then parameterize the vector field as the descent directions of the Lyapunov function. The resulting models have a same structure as the general Hamiltonian dyn… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  49. arXiv:2404.11968  [pdf, other

    cs.CL

    P-NAL: an Effective and Interpretable Entity Alignment Method

    Authors: Chuanhao Xu, **gwei Cheng, Fu Zhang

    Abstract: Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference st… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 2 figures

    ACM Class: I.2.4

  50. arXiv:2404.08559  [pdf, other

    cs.CL

    MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking

    Authors: Tianwen Tang, Tong Zhu, Haodong Liu, Yin Bai, Jia Cheng, Wenliang Chen

    Abstract: Zero-shot dialogue state tracking (DST) transfers knowledge to unseen domains, reducing the cost of annotating new datasets. Previous zero-shot DST models mainly suffer from domain transferring and partial prediction problems. To address these challenges, we propose Mixture of Prefix Experts (MoPE) to establish connections between similar slots in different domains, which strengthens the model tra… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024