Skip to main content

Showing 1–50 of 232 results for author: Duan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07919  [pdf, other

    cs.CV

    Exploring the Low-Pass Filtering Behavior in Image Super-Resolution

    Authors: Haoyu Deng, Zi**g Xu, Yule Duan, Xiao Wu, Wenjie Shu, Liang-Jian Deng

    Abstract: Deep neural networks for image super-resolution have shown significant advantages over traditional approaches like interpolation. However, they are often criticized as `black boxes' compared to traditional approaches which have solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks using theories from signal processing theories. We first report… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  2. arXiv:2405.07892  [pdf, other

    cs.LG

    All Nodes are created Not Equal: Node-Specific Layer Aggregation and Filtration for GNN

    Authors: Shilong Wang, Hao Wu, Yifan Duan, Guibin Zhang, Guohao Li, Yuxuan Liang, Shirui Pan, Kun Wang, Yang Wang

    Abstract: The ever-designed Graph Neural Networks, though opening a promising path for the modeling of the graph-structure data, unfortunately introduce two daunting obstacles to their deployment on devices. (I) Most of existing GNNs are shallow, due mostly to the over-smoothing and gradient-vanish problem as they go deeper as convolutional architectures. (II) The vast majority of GNNs adhere to the homophi… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2405.07573  [pdf, other

    cs.CV

    MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving

    Authors: Yiqun Duan, Xianda Guo, Zheng Zhu, Zhen Wang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature spa… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  4. arXiv:2405.06459  [pdf, other

    cs.CL cs.AI

    Are EEG-to-Text Models Working?

    Authors: Hyejeong Jo, Yiqian Yang, Juhyeok Han, Yiqun Duan, Hui Xiong, Won Hee Lee

    Abstract: This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  5. arXiv:2405.05636  [pdf, other

    cs.CV cs.AI

    SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space

    Authors: Zeren Zhang, Haibo Qin, Jiayu Huang, Yixin Li, Hui Lin, Yitao Duan, **wen Ma

    Abstract: Combining face swap** with lip synchronization technology offers a cost-effective solution for customized talking face generation. However, directly cascading existing models together tends to introduce significant interference between tasks and reduce video clarity because the interaction space is limited to the low-level semantic RGB space. To address this issue, we propose an innovative unifi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2405.05589  [pdf, other

    cs.RO

    Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

    Authors: Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, Yanyong Zhang

    Abstract: Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 19 figures

  7. arXiv:2405.03446  [pdf, other

    cs.CR

    SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

    Authors: Hangyuan Ji, Jian Yang, Linzheng Chai, Chaoren Wei, Liqun Yang, Yunlong Duan, Yunli Wang, Tianzhen Sun, Hongcheng Guo, Tongliang Li, Changyu Ren, Zhoujun Li

    Abstract: To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.01199  [pdf, other

    cs.CV

    Latent Fingerprint Matching via Dense Minutia Descriptor

    Authors: Zhiyu Pan, Yongjie Duan, Xiongjun Guan, Jianjiang Feng, Jie Zhou

    Abstract: Latent fingerprint matching is a daunting task, primarily due to the poor quality of latent fingerprints. In this study, we propose a deep-learning based dense minutia descriptor (DMD) for latent fingerprint matching. A DMD is obtained by extracting the fingerprint patch aligned by its central minutia, capturing detailed minutia information and texture information. Our dense descriptor takes the f… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  9. Regression of Dense Distortion Field from a Single Fingerprint Image

    Authors: Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou

    Abstract: Skin distortion is a long standing challenge in fingerprint matching, which causes false non-matches. Previous studies have shown that the recognition rate can be improved by estimating the distortion field from a distorted fingerprint and then rectifying it into a normal fingerprint. However, existing rectification methods are based on principal component representation of distortion fields, whic… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.17148

    Journal ref: IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4377-4390, 2023

  10. Direct Regression of Distortion Field from a Single Fingerprint Image

    Authors: Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou

    Abstract: Skin distortion is a long standing challenge in fingerprint matching, which causes false non-matches. Previous studies have shown that the recognition rate can be improved by estimating the distortion field from a distorted fingerprint and then rectifying it into a normal fingerprint. However, existing rectification methods are based on principal component representation of distortion fields, whic… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: 2022 IEEE International Joint Conference on Biometrics (IJCB), Abu Dhabi, United Arab Emirates, 2022, pp. 1-8

  11. arXiv:2404.07543  [pdf, other

    cs.CV eess.IV

    Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

    Authors: Yule Duan, Xiao Wu, Haoyu Deng, Liang-Jian Deng

    Abstract: Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolutio… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2404.05626  [pdf, other

    cs.CV

    Learning a Category-level Object Pose Estimator without Pose Annotations

    Authors: Fengrui Tian, Yaoyao Liu, Adam Kortylewski, Yueqi Duan, Shaoyi Du, Alan Yuille, Angtian Wang

    Abstract: 3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to learn a category-level 3D object pose estimator without pose annotations. Instead of using manually annotated images, we leverage diffusion models (e.g… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  13. arXiv:2404.05164  [pdf, other

    cs.RO

    Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes

    Authors: Yu Sheng, Lu Zhang, Xingchen Li, Yifan Duan, Yanyong Zhang, Yu Zhang, Jianmin Ji

    Abstract: Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  14. arXiv:2404.05163  [pdf, other

    cs.CV

    Semantic Flow: Learning Semantic Field of Dynamic Scenes from Monocular Videos

    Authors: Fengrui Tian, Yueqi Duan, Angtian Wang, Jianfei Guo, Shaoyi Du

    Abstract: In this work, we pioneer Semantic Flow, a neural semantic representation of dynamic scenes from monocular videos. In contrast to previous NeRF methods that reconstruct dynamic scenes from the colors and volume densities of individual points, Semantic Flow learns semantics from continuous flows that contain rich 3D motion information. As there is 2D-to-3D ambiguity problem in the viewing direction… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024, Codes are available at https://github.com/tianfr/Semantic-Flow/

  15. arXiv:2404.04943  [pdf

    cs.LG cs.AI cs.AR

    Chiplet Placement Order Exploration Based on Learning to Rank with Graph Representation

    Authors: Zhihui Deng, Yuanyuan Duan, Leilai Shao, Xiaolei Zhu

    Abstract: Chiplet-based systems, integrating various silicon dies manufactured at different integrated circuit technology nodes on a carrier interposer, have garnered significant attention in recent years due to their cost-effectiveness and competitive performance. The widespread adoption of reinforcement learning as a sequential placement method has introduced a new challenge in determining the optimal pla… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures and 6 tables, accepted by the Conference ISEDA

  16. arXiv:2404.04869  [pdf, other

    cs.RO cs.AI

    Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

    Authors: Yiqun Duan, Qiang Zhang, Ren**g Xu

    Abstract: The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  17. arXiv:2404.04026  [pdf, other

    cs.RO cs.CV

    MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

    Authors: Chenyang Wu, Yifan Duan, Xinran Zhang, Yu Sheng, Jianmin Ji, Yanyong Zhang

    Abstract: Localization and map** are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and map** in unbounded scenes. Our approach is inspired by the recently de… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  18. arXiv:2404.02077  [pdf, other

    cs.RO

    Energy-Optimized Planning in Non-Uniform Wind Fields with Fixed-Wing Aerial Vehicles

    Authors: Yufei Duan, Florian Achermann, Jaeyoung Lim, Roland Siegwart

    Abstract: Fixed-wing small uncrewed aerial vehicles (sUAVs) possess the capability to remain airborne for extended durations and traverse vast distances. However, their operation is susceptible to wind conditions, particularly in regions of complex terrain where high wind speeds may push the aircraft beyond its operational limitations, potentially raising safety concerns. Moreover, wind impacts the energy r… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  19. arXiv:2404.00243  [pdf, other

    cs.IR

    DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking

    Authors: Jiahao Yu, Yihai Duan, Longfei Xu, Chao Chen, Shuliang Liu, Li Chen, Kaikui Liu, Fan Yang, Ning Guo

    Abstract: Multi-scenario route ranking (MSRR) is crucial in many industrial map** systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically foc… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  20. arXiv:2403.19220  [pdf, other

    cs.CV

    GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds

    Authors: Shengjun Zhang, Xin Fei, Yueqi Duan

    Abstract: Point clouds captured by different sensors such as RGB-D cameras and LiDAR possess non-negligible domain gaps. Most existing methods design different network architectures and train separately on point clouds from various sensors. Typically, point-based methods achieve outstanding performances on even-distributed dense point clouds from RGB-D cameras, while voxel-based methods are more efficient f… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  21. arXiv:2403.15183  [pdf, other

    cs.RO

    CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition

    Authors: Shaowei Fu, Yifan Duan, Yao Li, Chengzhen Meng, Yingjie Wang, Jianmin Ji, Yanyong Zhang

    Abstract: The integration of complementary characteristics from camera and radar data has emerged as an effective approach in 3D object detection. However, such fusion-based methods remain unexplored for place recognition, an equally important task for autonomous systems. Given that place recognition relies on the similarity between a query scene and the corresponding candidate scene, the stationary backgro… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  22. arXiv:2403.14613  [pdf, other

    cs.CV cs.CL cs.LG

    DreamReward: Text-to-3D Generation with Human Preference

    Authors: Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu

    Abstract: 3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://jamesyjl.github.io/DreamReward

  23. arXiv:2403.13850  [pdf, other

    cs.LG cs.AI physics.flu-dyn

    Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance

    Authors: Hao Wu, Fan Xu, Yifan Duan, Ziwei Niu, Weiyan Wang, Gaofeng Lu, Kun Wang, Yuxuan Liang, Yang Wang

    Abstract: This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  24. arXiv:2403.11699  [pdf, other

    eess.IV cs.CV

    A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

    Authors: Zhengzheng Tu, Zigang Zhu, Yayang Duan, Bo Jiang, Qishun Wang, Chaoxue Zhang

    Abstract: Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  25. arXiv:2403.09625  [pdf, other

    cs.CV cs.LG

    Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

    Authors: Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan

    Abstract: Recent years have witnessed the strong power of 3D generation models, which offer a new level of creative flexibility by allowing users to guide the 3D content generation process through a single image or natural language. However, it remains challenging for existing 3D generation methods to create subject-driven 3D content across diverse prompts. In this paper, we introduce a novel 3D customizati… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Project page: https://liuff19.github.io/Make-Your-3D

  26. arXiv:2403.06974  [pdf, other

    cs.CV

    Memory-based Adapters for Online 3D Scene Perception

    Authors: Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perce… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR24. Link: https://xuxw98.github.io/Online3D/

  27. arXiv:2403.04703  [pdf, other

    cs.RO

    mmPlace: Robust Place Recognition with Intermediate Frequency Signal of Low-cost Single-chip Millimeter Wave Radar

    Authors: Chengzhen Meng, Yifan Duan, Chenming He, Dequan Wang, Xiaoran Fan, Yanyong Zhang

    Abstract: Place recognition is crucial for tasks like loop-closure detection and re-localization. Single-chip millimeter wave radar (single-chip radar in short) emerges as a low-cost sensor option for place recognition, with the advantage of insensitivity to degraded visual environments. However, it encounters two challenges. Firstly, sparse point cloud from single-chip radar leads to poor performance when… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 8 pages, 8 figures

  28. arXiv:2403.02308  [pdf, other

    cs.CV

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

    Authors: Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

    Abstract: Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), o… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  29. arXiv:2403.01748  [pdf, other

    cs.CL cs.AI

    Decode Neural signal as Speech

    Authors: Yiqian Yang, Yiqun Duan, Qiang Zhang, Ren**g Xu, Hui Xiong

    Abstract: Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of large language models. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the ex… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2403.01508  [pdf, other

    cs.AI

    Soft Reasoning on Uncertain Knowledge Graphs

    Authors: Weizhi Fei, Zihao Wang, Hang Yin, Yang Duan, Hanghang Tong, Yangqiu Song

    Abstract: The study of machine learning-based logical query-answering enables reasoning with large-scale and incomplete knowledge graphs. This paper further advances this line of research by considering the uncertainty in the knowledge. The uncertain nature of knowledge is widely observed in the real world, but \textit{does not} align seamlessly with the first-order logic underpinning existing studies. To b… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 10 pages

  31. arXiv:2402.18294  [pdf, other

    cs.RO

    Whole-body Humanoid Robot Locomotion with Human Reference

    Authors: Qiang Zhang, Peter Cui, David Yan, **gkai Sun, Yiqun Duan, Arthur Zhang, Ren**g Xu

    Abstract: Recently, humanoid robots have made significant advances in their ability to perform challenging tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of designing complicated reward functions and training entire sophisticated systems, still poses a notable challenge. To conquer these challenges, after many iterati… ▽ More

    Submitted 1 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 7pages, 7 figures

  32. arXiv:2402.14708  [pdf, other

    cs.LG cs.AI q-fin.ST

    CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks

    Authors: Yifan Duan, Guibin Zhang, Shilong Wang, Xiaojiang Peng, Wang Ziqi, Junyuan Mao, Hao Wu, Xinke Jiang, Kun Wang

    Abstract: Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  33. arXiv:2402.04504  [pdf, other

    cs.CV

    Text2Street: Controllable Text-to-image Generation for Street Views

    Authors: **ming Su, Songen Gu, Yiting Duan, Xingyue Chen, Junfeng Luo

    Abstract: Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is complex, the traffic status is diverse and the weather condition is various, which makes conventional text-to-image models difficult to deal with. To address these… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  34. arXiv:2402.03307  [pdf, other

    cs.CV

    4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes

    Authors: Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen

    Abstract: We consider the problem of novel view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  35. arXiv:2402.01158  [pdf, other

    cs.CL

    LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

    Authors: Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Han Ma, Yaofei Duan, Yanlan Kang, Songhua Yang, Baoyu Fan, Tao Tan

    Abstract: ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses gen… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 17 pages, 13 tables, 7 figures

  36. arXiv:2401.17642  [pdf, other

    cs.CV

    Exploring the Common Appearance-Boundary Adaptation for Nighttime Optical Flow

    Authors: Hanyu Zhou, Yi Chang, Haoyue Liu, Wending Yan, Yuxing Duan, Zhiwei Shi, Luxin Yan

    Abstract: We investigate a challenging task of nighttime optical flow, which suffers from weakened texture and amplified noise. These degradations weaken discriminative visual features, thus causing invalid motion feature matching. Typically, existing methods employ domain adaptation to transfer knowledge from auxiliary domain to nighttime domain in either input visual space or output motion space. However,… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  37. arXiv:2401.15663  [pdf, other

    eess.IV cs.CV

    Low-resolution Prior Equilibrium Network for CT Reconstruction

    Authors: Yijie Yang, Qifeng Gao, Yu** Duan

    Abstract: The unrolling method has been investigated for learning variational models in X-ray computed tomography. However, it has been observed that directly unrolling the regularization model through gradient descent does not produce satisfactory results. In this paper, we present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regul… ▽ More

    Submitted 18 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  38. arXiv:2401.15307  [pdf, other

    eess.IV cs.CV

    ParaTransCNN: Parallelized TransCNN Encoder for Medical Image Segmentation

    Authors: Hongkun Sun, **g Xu, Yu** Duan

    Abstract: The convolutional neural network-based methods have become more and more popular for medical image segmentation due to their outstanding performance. However, they struggle with capturing long-range dependencies, which are essential for accurately modeling global contextual correlations. Thanks to the ability to model long-range dependencies by expanding the receptive field, the transformer-based… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  39. arXiv:2401.05233  [pdf, other

    cs.LG cs.IT eess.SY math.OC stat.ML

    Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

    Authors: Yaqi Duan, Martin J. Wainwright

    Abstract: We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings. Our analysis highlights two key stability properties, relating to how changes in value functions and/or policies affect the Bellman operator and occupation measures. We argue that these properties are satisf… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  40. arXiv:2312.16983  [pdf, other

    cs.LG cs.AI

    PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance

    Authors: Taicai Chen, Yue Duan, Dong Li, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  41. arXiv:2312.16895  [pdf, other

    cs.LG cs.AR

    RLPlanner: Reinforcement Learning based Floorplanning for Chiplets with Fast Thermal Analysis

    Authors: Yuanyuan Duan, Xingchen Liu, Zhi** Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: Chiplet-based systems have gained significant attention in recent years due to their low cost and competitive performance. As the complexity and compactness of a chiplet-based system increase, careful consideration must be given to microbump assignments, interconnect delays, and thermal limitations during the floorplanning stage. This paper introduces RLPlanner, an efficient early-stage floorplann… ▽ More

    Submitted 16 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  42. arXiv:2312.14557  [pdf, other

    cs.CL

    Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

    Authors: Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan

    Abstract: Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets wi… ▽ More

    Submitted 1 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 10 pages, 2 figures

  43. arXiv:2312.12903  [pdf, ps, other

    eess.SY cs.LG math.DS

    A Minimal Control Family of Dynamical Syetem for Universal Approximation

    Authors: Yifei Duan, Yongqiang Cai

    Abstract: The universal approximation property (UAP) of neural networks is a fundamental characteristic of deep learning. It is widely recognized that a composition of linear functions and non-linear functions, such as the rectified linear unit (ReLU) activation function, can approximate continuous functions on compact domains. In this paper, we extend this efficacy to the scenario of dynamical systems with… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 19 pages

    MSC Class: 68T07; 65P99; 65Z05; 41A65

  44. arXiv:2312.12237  [pdf, other

    cs.LG

    Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

    Authors: Yue Duan, Zhen Zhao, Lei Qi, Lu** Zhou, Lei Wang, Yinghuan Shi

    Abstract: While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e.g., fine-grained visual classification in the context of SSL (SS-FGVC). The increased recognition difficulty on fine-grained unlabeled data spells disaster for pseudo-labeling accuracy, resulting in… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  45. arXiv:2312.10023  [pdf

    cs.LG

    A Kronecker product accelerated efficient sparse Gaussian Process (E-SGP) for flow emulation

    Authors: Yu Duan, Matthew Eaton, Michael Bluck

    Abstract: In this paper, we introduce an efficient sparse Gaussian process (E-SGP) for the surrogate modelling of fluid mechanics. This novel Bayesian machine learning algorithm allows efficient model training using databases of different structures. It is a further development of the approximated sparse GP algorithm, combining the concept of efficient GP (E-GP) and variational energy free sparse Gaussian p… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  46. arXiv:2312.09243  [pdf, other

    cs.CV

    OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments

    Authors: Chubin Zhang, Juncheng Yan, Yi Wei, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu

    Abstract: As a fundamental task of vision-based perception, 3D occupancy prediction reconstructs 3D structures of surrounding environments. It provides detailed information for autonomous driving planning and navigation. However, most existing methods heavily rely on the LiDAR point clouds to generate occupancy ground truth, which is not available in the vision-based system. In this paper, we propose an Occ… ▽ More

    Submitted 29 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Code: https://github.com/LinShan-Bin/OccNeRF

  47. arXiv:2312.06655  [pdf, other

    cs.CV cs.GR cs.LG

    Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

    Authors: Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi Duan

    Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich deta… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://liuff19.github.io/Sherpa3D/

  48. arXiv:2312.00343  [pdf, other

    cs.CV

    OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

    Authors: Xianda Guo, Juntao Lu, Chenming Zhang, Yiqi Wang, Yiqun Duan, Tian Yang, Zheng Zhu, Long Chen

    Abstract: Stereo matching aims to estimate the disparity between matching pixels in a stereo image pair, which is of great importance to robotics, autonomous driving, and other computer vision tasks. Despite the development of numerous impressive methods in recent years, replicating their results and determining the most suitable architecture for practical application remains challenging. Addressing this ga… ▽ More

    Submitted 1 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Code is available at: https://github.com/XiandaGuo/OpenStereo

  49. arXiv:2311.18576  [pdf, other

    cs.CV cs.AI

    Fingerprint Matching with Localized Deep Representation

    Authors: Yongjie Duan, Zhiyu Pan, Jianjiang Feng, Jie Zhou

    Abstract: Compared to minutia-based fingerprint representations, fixed-length representations are attractive due to simple and efficient matching. However, fixed-length fingerprint representations are limited in accuracy when matching fingerprints with different visible areas, which can occur due to different finger poses or acquisition methods. To address this issue, we propose a localized deep representat… ▽ More

    Submitted 2 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: 18 pages, 20 figures

  50. arXiv:2311.16038  [pdf, other

    cs.CV cs.AI cs.LG

    OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

    Authors: Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu

    Abstract: Understanding how the 3D scene evolves is vital for making decisions in autonomous driving. Most existing methods achieve this by predicting the movements of object boxes, which cannot capture more fine-grained scene information. In this paper, we explore a new framework of learning a world model, OccWorld, in the 3D Occupancy space to simultaneously predict the movement of the ego car and the evo… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Code is available at: https://github.com/wzzheng/OccWorld