Skip to main content

Showing 1–50 of 310 results for author: Kong, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18201  [pdf, other

    eess.IV cs.CV

    EFCNet: Every Feature Counts for Small Medical Object Segmentation

    Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

    Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Streith-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  4. arXiv:2406.16708  [pdf, other

    cs.LG stat.ME

    CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

    Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

    Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16567  [pdf, other

    cs.CL

    Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

    Authors: Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: Existing dialogue data augmentation (DA) techniques predominantly focus on augmenting utterance-level dialogues, which makes it difficult to take dialogue contextual information into account. The advent of large language models (LLMs) has simplified the implementation of multi-turn dialogues. Due to absence of professional understanding and knowledge, it remains challenging to deliver satisfactory… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16500  [pdf, other

    cs.NE

    A Dual-Channel Particle Swarm Optimization Algorithm Based on Adaptive Balance Search

    Authors: Zhenxing Zhang, Tianxian Zhang, Xiangliang Xu, Lingjiang Kong, Yi Han, Zicheng Wang

    Abstract: The balance between exploration (Er) and exploitation (Ei) determines the generalization performance of the particle swarm optimization (PSO) algorithm on different problems. Although the insufficient balance caused by global best being located near a local minimum has been widely researched, few scholars have systematically paid attention to two behaviors about personal best position (P) and glob… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.14683  [pdf, other

    cs.LG cs.CL

    TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models

    Authors: Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, Muhan Zhang

    Abstract: In this report, we present TAGLAS, an atlas of text-attributed graph (TAG) datasets and benchmarks. TAGs are graphs with node and edge features represented in text, which have recently gained wide applicability in training graph-language or graph foundation models. In TAGLAS, we collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs and tasks f… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  8. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.12214  [pdf, other

    cs.RO cs.CV

    Is Your HD Map Constructor Reliable under Sensor Corruptions?

    Authors: Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, **g Zhang

    Abstract: Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: project url: https://mapbench.github.io/

  10. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, **long Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.09130  [pdf, other

    cs.LG cs.AI

    Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

    Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash

    Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We ident… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages

    ACM Class: H.0

  12. arXiv:2406.08627  [pdf, other

    cs.LG cs.CL

    Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

    Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash

    Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of text… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.06649  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

    Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

    Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

  14. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  15. arXiv:2406.05723  [pdf, other

    cs.CV

    Binarized Diffusion Model for Image Super-Resolution

    Authors: Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang

    Abstract: Advanced diffusion models (DMs) perform impressively in image super-resolution (SR), but the high memory and computational costs hinder their deployment. Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating DMs. Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant perfor… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/zhengchen1999/BI-DiffSR

  16. arXiv:2406.05316  [pdf, other

    cs.LG

    C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting

    Authors: Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, Linghe Kong

    Abstract: In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2406.02131  [pdf, other

    cs.LG cs.AI

    CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

    Authors: Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming **, Linghe Kong

    Abstract: Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 13 figures

  19. arXiv:2406.01876  [pdf, other

    cs.DB cs.AI cs.CL cs.IR cs.LG

    GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security

    Authors: Xuanqing Liu, Luyang Kong, Runhui Wang, Patrick Song, Austin Nevins, Henrik Johnson, Nimish Amlathe, Davor Golac

    Abstract: Schema matching constitutes a pivotal phase in the data ingestion process for contemporary database systems. Its objective is to discern pairwise similarities between two sets of attributes, each associated with a distinct data table. This challenge emerges at the initial stages of data analytics, such as when incorporating a third-party table into existing databases to inform business insights. G… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: KDD 2024 Camera Ready; 11 pages, 8 figures

  20. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Ling**g Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  21. arXiv:2405.20390  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Quantitative Convergences of Lie Group Momentum Optimizers

    Authors: Lingkai Kong, Molei Tao

    Abstract: Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly propos… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  22. arXiv:2405.17741  [pdf, other

    cs.AI

    LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

    Authors: Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

    Abstract: Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this p… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.17426  [pdf, other

    cs.CV cs.RO

    Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving

    Authors: Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Recent advancements in bird's eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. Thi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 13 figures, 11 tables; Code at this https URL: https://github.com/Daniel-xsy/RoboBEV

  24. arXiv:2405.16381  [pdf, other

    cs.LG cs.AI stat.ML

    Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

    Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

    Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  25. arXiv:2405.15756  [pdf, other

    cs.LG cs.AI

    Sparse Expansion and Neuronal Disentanglement

    Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

    Abstract: We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other on… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures

  26. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, **gwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  27. arXiv:2405.14343  [pdf, other

    cs.CV

    Efficient Visual State Space Model for Image Deblurring

    Authors: Lingshun Kong, Jiangxin Dong, Ming-Hsuan Yang, **shan Pan

    Abstract: Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. ViTs typically yield superior results in image restoration compared to CNNs due to their ability to capture long-range dependencies and input-dependent characteristics. However, the computational complexity of Transformer-based models grows quadratically with the image reso… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, **yue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  30. arXiv:2405.05259  [pdf, other

    cs.CV cs.RO

    OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

    Authors: Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

    Abstract: Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we syner… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 (Highlight); 26 pages, 12 figures, 11 tables; Code at https://github.com/ldkong1205/OpenESS

  31. arXiv:2405.05258  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

    Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

    Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

  32. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  33. arXiv:2405.01538  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Space Alignments Towards Universal LiDAR Segmentation

    Authors: Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

    Abstract: A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine lar… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: CVPR 2024; 33 pages, 14 figures, 14 tables; Code at https://github.com/youquanl/M3Net

  34. arXiv:2404.14542  [pdf, other

    cs.CV

    UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

    Authors: Yaofeng Xie, Lingwei Kong, Kai Chen, Ziqiang Zheng, Xiao Yu, Zhibin Yu, Bing Zheng

    Abstract: Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancem… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages,CVPR2024 accept

    ACM Class: I.4

  35. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: **yue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables

  36. arXiv:2403.17010  [pdf, other

    cs.CV cs.LG cs.RO

    Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

    Authors: Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint; 37 pages, 8 figures, 11 tables; Code at https://github.com/ldkong1205/Calib3D

  37. arXiv:2403.17009  [pdf, other

    cs.CV cs.RO

    Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions

    Authors: Ye Li, Lingdong Kong, Hanjiang Hu, Xiaohao Xu, Xiaonan Huang

    Abstract: The robustness of driving perception systems under unprecedented conditions is crucial for safety-critical usages. Latest advancements have prompted increasing interests towards multi-LiDAR perception. However, prevailing driving datasets predominantly utilize single-LiDAR systems and collect data devoid of adverse conditions, failing to capture the complexities of real-world environments accurate… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint; 40 pages, 11 figures, 15 tables; Code at https://github.com/ywyeli/Place3D

  38. arXiv:2403.15696  [pdf, other

    cs.AI cs.CL

    MixRED: A Mix-lingual Relation Extraction Dataset

    Authors: Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

    Abstract: Relation extraction is a critical task in the field of natural language processing with numerous real-world applications. Existing research primarily focuses on monolingual relation extraction or cross-lingual enhancement for relation extraction. Yet, there remains a significant gap in understanding relation extraction in the mix-lingual (or code-switching) scenario, where individuals intermix con… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  39. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  40. arXiv:2403.12012  [pdf, other

    math.ST cs.LG math.NA math.PR stat.ML

    Convergence of Kinetic Langevin Monte Carlo on Lie groups

    Authors: Lingkai Kong, Molei Tao

    Abstract: Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the trivialized momentum variable is Euclidean despite that the potenti… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  41. arXiv:2403.10001  [pdf, other

    cs.CV

    Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

    Authors: **gyi Xu, Weidong Yang, Lingdong Kong, Youquan Liu, Rui Zhang, Qingyuan Zhou, Ben Fei

    Abstract: Unsupervised domain adaptation (UDA) is vital for alleviating the workload of labeling 3D point cloud data and mitigating the absence of labels when facing a newly defined domain. Various methods of utilizing images to enhance the performance of cross-domain 3D segmentation have recently emerged. However, the pseudo labels, which are generated from models trained on the source domain and provide a… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  42. arXiv:2403.02910  [pdf, other

    cs.CV cs.AI

    ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

    Authors: Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong

    Abstract: There has been an increasing interest in the alignment of large language models (LLMs) with human values. However, the safety issues of their integration with a vision module, or vision language models (VLMs), remain relatively underexplored. In this paper, we propose a novel jailbreaking attack against VLMs, aiming to bypass their safety barrier when a user inputs harmful instructions. A scenario… ▽ More

    Submitted 5 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  43. arXiv:2403.00812  [pdf, other

    cs.CL cs.AI

    LoRA Meets Dropout under a Unified Framework

    Authors: Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu

    Abstract: With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated wi… ▽ More

    Submitted 26 May, 2024; v1 submitted 25 February, 2024; originally announced March 2024.

  44. arXiv:2403.00231  [pdf, other

    cs.CV cs.CL

    Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

    Authors: Lei Li, Yuqi Wang, Runxin Xu, Peiyi Wang, Xiachong Feng, Lingpeng Kong, Qi Liu

    Abstract: Large vision-language models (LVLMs) excel across diverse tasks involving concrete images from natural scenes. However, their ability to interpret abstract figures, such as geometry shapes and scientific plots, remains limited due to a scarcity of training datasets in scientific domains. To fill this gap, we introduce Multimodal ArXiv, consisting of ArXivCap and ArXivQA, for enhancing LVLMs scient… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Project page: https://mm-arxiv.github.io, Camera Ready Version of ACL 2024

  45. arXiv:2402.19255  [pdf, other

    cs.CL

    GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

    Authors: Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, Wei Bi

    Abstract: Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understand and apply mathematical knowledge or merely rely on shortcuts for mathematical reasoning. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs ca… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  46. arXiv:2402.18946  [pdf, other

    cs.LG eess.SY

    Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

    Authors: Yu Zhang, Long Wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, Wei He, Alois Knoll

    Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Sub… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  47. arXiv:2402.18012  [pdf, other

    cs.LG cs.AI

    Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints

    Authors: Lingkai Kong, Yuanqi Du, Wenhao Mu, Kirill Neklyudov, Valentin De Bortoli, Haorui Wang, Dongxia Wu, Aaron Ferber, Yi-An Ma, Carla P. Gomes, Chao Zhang

    Abstract: Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in pra… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  48. arXiv:2402.17463  [pdf, other

    cs.CL

    Training-Free Long-Context Scaling of Large Language Models

    Authors: Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

    Abstract: The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  49. arXiv:2402.16902  [pdf, other

    cs.LG

    PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

    Authors: Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

    Abstract: With the rapid scaling of large language models (LLMs), serving numerous low-rank adaptations (LoRAs) concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  50. arXiv:2402.15809  [pdf, other

    cs.AI cs.CL

    Empowering Large Language Model Agents through Action Learning

    Authors: Haiteng Zhao, Chang Ma, Guoyin Wang, **g Su, Lingpeng Kong, **g**g Xu, Zhi-Hong Deng, Hongxia Yang

    Abstract: Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior. In this work, we argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. While humans naturally expand their action spaces and develop skills through… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 9 pages