Skip to main content

Showing 1–50 of 804 results for author: Jiang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00708  [pdf, other

    cs.LG

    Heterogeneous Graph Contrastive Learning with Spectral Augmentation

    Authors: **g Zhang, Xiaoqian Jiang, Yingjie Xie, Cangqi Zhou

    Abstract: Heterogeneous graphs can well describe the complex entity relationships in the real world. For example, online shop** networks contain multiple physical types of consumers and products, as well as multiple relationship types such as purchasing and favoriting. More and more scholars pay attention to this research because heterogeneous graph representation learning shows strong application potenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.19435  [pdf, other

    cs.CV

    A Sanity Check for AI-generated Image Detection

    Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

    Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

  3. arXiv:2406.19162  [pdf, other

    cs.CV

    Single Image Estimation of Cell Migration Direction by Deep Circular Regression

    Authors: Lennart Bruns, Lucas Lamparter, Milos Galic, Xiaoyi Jiang

    Abstract: In this paper we study the problem of estimating the migration direction of cells based on a single image. To the best of our knowledge, there is only one related work that uses a classification CNN for four classes (quadrants). This approach does not allow detailed directional resolution. We solve the single image estimation problem using deep circular regression with special attention to cycle-s… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.18259  [pdf, other

    cs.CL cs.AI

    Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

    Authors: Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, Xinru Lu

    Abstract: As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlap** behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentia… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 2 figures

  5. arXiv:2406.16144  [pdf, other

    cs.CL

    Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.15762  [pdf, other

    cs.LG stat.ML

    Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow

    Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang

    Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.11252  [pdf, other

    cs.CV

    Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning

    Authors: Cilin Yan, Haochen Wang, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves

    Abstract: Contrastive Vision-Language Pre-training(CLIP) demonstrates impressive zero-shot capability. The key to improve the adaptation of CLIP to downstream task with few exemplars lies in how to effectively model and transfer the useful knowledge embedded in CLIP. Previous work mines the knowledge typically based on the limited visual samples and close-set semantics (i.e., within target category set of d… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11160  [pdf, other

    cs.AI

    Context Graph

    Authors: Cheng** Xu, Muzhi Li, Cehao Yang, Xuhui Jiang, Lumingyuan Tang, Yiyan Qi, Jian Guo

    Abstract: Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Context Graphs} (CGs) expan… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  9. arXiv:2406.10958  [pdf, other

    math.OC cs.CL cs.MA

    City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization

    Authors: Zihao Jiao, Mengyi Sha, Haoyu Zhang, Xinyu Jiang, Wei Qi

    Abstract: Existing operations research (OR) models and tools play indispensable roles in smart-city operations, yet their practical implementation is limited by the complexity of modeling and deficiencies in optimization proficiency. To generate more relevant and accurate solutions to users' requirements, we propose a large language model (LLM)-based agent ("City-LEO") that enhances the efficiency and trans… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 26 pages, 8 figures, 5 tables

  10. arXiv:2406.10521  [pdf, other

    cs.LG cs.AI

    MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

    Authors: Yaobin Ling, Xiaoqian Jiang, Ye** Kim

    Abstract: In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  11. arXiv:2406.10278  [pdf, other

    cs.CL cs.AI

    Prompt-Based Length Controlled Generation with Multiple Control Types

    Authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. D… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 findings. arXiv admin note: text overlap with arXiv:2308.12030

  12. arXiv:2406.09385  [pdf, other

    cs.CV

    Towards Vision-Language Geo-Foundation Model: A Survey

    Authors: Yue Zhou, Litong Feng, Yi** Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang

    Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 4 figures

  13. arXiv:2406.08496  [pdf, other

    cs.DC

    LPSim: Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework

    Authors: Xuan Jiang, Raja Sengupta, James Demmel, Samuel Williams

    Abstract: Traffic propagation simulation is crucial for urban planning, enabling congestion analysis, travel time estimation, and route optimization. Traditional micro-simulation frameworks are limited to main roads due to the complexity of urban mobility and large-scale data. We introduce the Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework (LPSim), a scalable tool… ▽ More

    Submitted 25 April, 2024; originally announced June 2024.

  14. arXiv:2406.07528  [pdf, other

    cs.LG

    QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

    Authors: **gyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia

    Abstract: The capacity of Large Language Models (LLMs) to comprehend and reason over long contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing long-distance dependencies within sequences to deeply understand semantics. To address this issue, we introduce Query-aware Inference for LLMs (Q-LLM), a system designed to process extensive sequences akin to human cognition.… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.04340  [pdf, other

    cs.CV

    GLACE: Global Local Accelerated Coordinate Encoding

    Authors: Fang**hua Wang, Xudong Jiang, Silvano Galliani, Christoph Vogel, Marc Pollefeys

    Abstract: Scene coordinate regression (SCR) methods are a family of visual localization methods that directly regress 2D-3D matches for camera pose estimation. They are effective in small-scale scenes but face significant challenges in large-scale scenes that are further amplified in the absence of ground truth 3D point clouds for supervision. Here, the model can only rely on reprojection constraints and ne… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Large-scale visual localization with a single optimizable MLP. CVPR 2024. Code: https://github.com/cvg/glace. Project page: https://xjiangan.github.io/glace

  16. arXiv:2406.02784  [pdf, other

    cs.NI

    Feasibility of State Space Models for Network Traffic Generation

    Authors: Andrew Chu, Xi Jiang, Shinan Liu, Arjun Bhagoji, Francesco Bronzino, Paul Schmitt, Nick Feamster

    Abstract: Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of rea… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures, 4 tables

  17. arXiv:2406.01555  [pdf, other

    cs.CV

    Towards Flexible Interactive Reflection Removal with Human Guidance

    Authors: Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang

    Abstract: Single image reflection removal is inherently ambiguous, as both the reflection and transmission components requiring separation may follow natural image statistics. Existing methods attempt to address the issue by using various types of low-level and physics-based cues as sources of reflection signals. However, these cues are not universally applicable, since they are only observable in specific… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  18. arXiv:2405.19990  [pdf, other

    cs.CV

    DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

    Authors: Wenli Sun, Xinyang Jiang, Dongsheng Li, Cairong Zhao

    Abstract: Person Re-Identification (ReID) systems pose a significant security risk from backdoor attacks, allowing adversaries to evade tracking or impersonate others. Beyond recognizing this issue, we investigate how backdoor attacks can be deployed in real-world scenarios, where a ReID model is typically trained on data collected in the digital domain and then deployed in a physical environment. This atta… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  19. arXiv:2405.19149  [pdf, other

    cs.CV cs.AI cs.IR

    CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

    Authors: Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, Xueming Qian

    Abstract: Composed Image Retrieval (CIR) involves searching for target images based on an image-text pair query. While current methods treat this as a query-target matching problem, we argue that CIR triplets contain additional associations beyond this primary relation. In our paper, we identify two new relations within triplets, treating each triplet as a graph node. Firstly, we introduce the concept of te… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: To appear at SIGIR 2024. arXiv admin note: text overlap with arXiv:2309.02169

  20. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  21. arXiv:2405.15525  [pdf, other

    cs.CL

    Sparse Matrix in Large Language Model Fine-tuning

    Authors: Haoze He, Juncheng Billy Li, Xuan Jiang, Heather Miller

    Abstract: LoRA and its variants have become popular parameter-efficient fine-tuning (PEFT) methods due to their ability to avoid excessive computational costs. However, an accuracy gap often exists between PEFT methods and full fine-tuning (FT), and this gap has yet to be systematically studied. In this work, we introduce a method for selecting sparse sub-matrices that aim to minimize the performance gap be… ▽ More

    Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 14 pages

  22. arXiv:2405.15465  [pdf, other

    cs.CV

    Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection

    Authors: Fan Liu, Liang Yao, Chuanyi Zhang, Ting Wu, Xinlei Zhang, Xiruo Jiang, Jun Zhou

    Abstract: Detecting objects from Unmanned Aerial Vehicles (UAV) is often hindered by a large number of small objects, resulting in low detection accuracy. To address this issue, mainstream approaches typically utilize multi-stage inferences. Despite their remarkable detecting accuracies, real-time efficiency is sacrificed, making them less practical to handle real applications. To this end, we propose to im… ▽ More

    Submitted 31 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  23. arXiv:2405.15373  [pdf, other

    cs.RO cs.AI

    Autonomous Quilt Spreading for Caregiving Robots

    Authors: Yuchun Guo, Zhiqing Lu, Yanling Zhou, Xin Jiang

    Abstract: In this work, we propose a novel strategy to ensure infants, who inadvertently displace their quilts during sleep, are promptly and accurately re-covered. Our approach is formulated into two subsequent steps: interference resolution and quilt spreading. By leveraging the DWPose human skeletal detection and the Segment Anything instance segmentation models, the proposed method can accurately recogn… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  24. arXiv:2405.14796  [pdf, other

    cs.CV cs.AI q-bio.QM

    Generative Plant Growth Simulation from Sequence-Informed Environmental Conditions

    Authors: Mohamed Debbagh, Yixue Liu, Zhouzhou Zheng, Xintong Jiang, Shangpeng Sun, Mark Lefsrud

    Abstract: A plant growth simulation can be characterized as a reconstructed visual representation of a plant or plant system. The phenotypic characteristics and plant structures are controlled by the scene environment and other contextual attributes. Considering the temporal dependencies and compounding effects of various factors on growth trajectories, we formulate a probabilistic approach to the simulatio… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  25. arXiv:2405.14722  [pdf, other

    cs.CL

    CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

    Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, **gyao Li, **g Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

    Abstract: Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Technical Report

  26. arXiv:2405.14520  [pdf, other

    cs.CV

    Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

    Authors: Xingguang Jiang, Xiaofeng Bian, Chenggang Guo

    Abstract: Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing metho… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  27. arXiv:2405.12541  [pdf, other

    cs.AI

    DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge

    Authors: Bufang Yang, Siyang Jiang, Lilin Xu, Kaiwei Liu, Hai Li, Guoliang Xing, Hongkai Chen, Xiaofan Jiang, Zhenyu Yan

    Abstract: Large language models (LLMs) have the potential to transform digital healthcare, as evidenced by recent advances in LLM-based virtual doctors. However, current approaches rely on patient's subjective descriptions of symptoms, causing increased misdiagnosis. Recognizing the value of daily data from smart devices, we introduce a novel LLM-based multi-turn consultation virtual doctor system, DrHouse,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  28. arXiv:2405.11831  [pdf, other

    eess.AS cs.LG

    SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

    Authors: Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani

    Abstract: Transformers have revolutionized deep learning across various tasks, including audio representation learning, due to their powerful modeling capabilities. However, they often suffer from quadratic complexity in both GPU memory usage and computational inference time, affecting their efficiency. Recently, state space models (SSMs) like Mamba have emerged as a promising alternative, offering a more e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Code at https://github.com/SiavashShams/ssamba

  29. arXiv:2405.11535  [pdf, ps, other

    cs.PL

    Proving Functional Program Equivalence via Directed Lemma Synthesis

    Authors: Yican Sun, Ruyi Ji, Jian Fang, Xuanlin Jiang, Mingshuai Chen, Yingfei Xiong

    Abstract: Proving equivalence between functional programs is a fundamental problem in program verification, which often amounts to reasoning about algebraic data types (ADTs) and compositions of structural recursions. Modern theorem provers address this problem by applying structural induction, which is insufficient for proving many equivalence theorems. In such cases, one has to invent a set of lemmas, pro… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 21 pages

  30. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  31. arXiv:2405.07696  [pdf, other

    cs.CV

    MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

    Authors: Xueying Jiang, Sheng **, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses t… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  32. arXiv:2405.06342  [pdf, other

    cs.CV eess.IV

    Compression-Realized Deep Structural Network for Video Quality Enhancement

    Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

    Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  33. arXiv:2405.01533  [pdf, other

    cs.CV

    OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

    Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

    Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  34. arXiv:2405.01242  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

    Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

    Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  35. arXiv:2405.01054  [pdf, other

    cs.RO cs.CV cs.LG

    Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing

    Authors: Zhongzheng Qiao, Xuan Huy Pham, Savitha Ramasamy, Xudong Jiang, Erdal Kayacan, Andriy Sarabakha

    Abstract: In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, in 2024 International Joint Conference on Neural Networks (IJCNN)

  36. arXiv:2405.00557  [pdf, other

    cs.CL cs.AI

    Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

    Authors: Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

    Abstract: As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge, posing potential risks during deployment. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  37. arXiv:2404.19311  [pdf, other

    cs.CV cs.MM

    A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images

    Authors: Wang Zhang, Tingting Li, Yuntian Zhang, Gensheng Pei, Xiruo Jiang, Yazhou Yao

    Abstract: Matching visible and near-infrared (NIR) images remains a significant challenge in remote sensing image fusion. The nonlinear radiometric differences between heterogeneous remote sensing images make the image matching task even more difficult. Deep learning has gained substantial attention in computer vision tasks in recent years. However, many methods rely on supervised learning and necessitate l… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: accepted by Information Fusion

  38. arXiv:2404.19282  [pdf, other

    cs.MM

    Dual Dynamic Threshold Adjustment Strategy for Deep Metric Learning

    Authors: Xiruo Jiang, Yazhou Yao, Sheng Liu, Fumin Shen, Liqiang Nie, Xiansheng Hua

    Abstract: Loss functions and sample mining strategies are essential components in deep metric learning algorithms. However, the existing loss function or mining strategy often necessitate the incorporation of additional hyperparameters, notably the threshold, which defines whether the sample pair is informative. The threshold provides a stable numerical standard for determining whether to retain the pairs.… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

  39. arXiv:2404.18953  [pdf, other

    math.OC cs.NE

    A Knowledge-driven Memetic Algorithm for the Energy-efficient Distributed Homogeneous Flow Shop Scheduling Problem

    Authors: Yunbao Xu, Xuemei Jiang, Jun Li, Lining Xing, Yanjie Song

    Abstract: The reduction of carbon emissions in the manufacturing industry holds significant importance in achieving the national "double carbon" target. Ensuring energy efficiency is a crucial factor to be incorporated into future generation manufacturing systems. In this study, energy consumption is considered in the distributed homogeneous flow shop scheduling problem (DHFSSP). A knowledge-driven memetic… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 14 pages

  40. arXiv:2404.17771  [pdf, ps, other

    cs.CV

    Characterization of dim light response in DVS pixel: Discontinuity of event triggering time

    Authors: Xiao Jiang, Fei Zhou

    Abstract: Dynamic Vision Sensors (DVS) have recently generated great interest because of the advantages of wide dynamic range and low latency compared with conventional frame-based cameras. However, the complicated behaviors in dim light conditions are still not clear, restricting the applications of DVS. In this paper, we analyze the typical DVS circuit, and find that there exists discontinuity of event tr… ▽ More

    Submitted 30 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures

  41. arXiv:2404.16645  [pdf, other

    cs.CL cs.AI

    Tele-FLM Technical Report

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  42. arXiv:2404.16323  [pdf, other

    cs.CV

    DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction

    Authors: Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang

    Abstract: In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, all… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  43. arXiv:2404.15772  [pdf, other

    cs.LG

    Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

    Authors: Aobo Liang, Xingguo Jiang, Yan Sun, Xiaohou Shi, Ke Li

    Abstract: Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. Over the past few years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, LTSF faces inherent challenges such as long-term dependencies capturing and sparse semantic characteristics. Recently, a new state space model (SSM) named Mamba is proposed… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: New Mamba-based architecture. All experiments rerun

  44. arXiv:2404.15771  [pdf, other

    cs.CV cs.MM

    DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

    Authors: Xin Jiang, Hao Tang, Rui Yan, **hui Tang, Zechao Li

    Abstract: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discr… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  45. arXiv:2404.12861  [pdf, other

    cs.CV

    Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation

    Authors: Yilong Chen, Zongyi Xu, xiaoshui Huang, Ruicheng Zhang, Xinqi Jiang, Xinbo Gao

    Abstract: Current point cloud semantic segmentation has achieved great advances when given sufficient labels. However, the dense annotation of LiDAR point clouds remains prohibitively expensive and time-consuming, unable to keep up with the continuously growing volume of data. In this paper, we propose annotating images with scattered points, followed by utilizing SAM (a Foundation model) to generate semant… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  46. arXiv:2404.12457  [pdf, other

    cs.DC cs.CL cs.LG

    RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

    Authors: Chao **, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin **

    Abstract: Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long sequence generation and leads to high computation and memory costs. We propose RAGCache, a novel multilevel dynamic caching system tailored for RAG. Our analys… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  47. arXiv:2404.12186  [pdf, other

    cs.LG cs.CR

    Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

    Authors: Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo

    Abstract: With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidenc… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  48. arXiv:2404.09586  [pdf, other

    cs.CV cs.LG

    Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing

    Authors: Song Xia, Yi Yu, Xudong Jiang, Henghui Ding

    Abstract: Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expans… ▽ More

    Submitted 15 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted to the International Conference on Learning Representations (ICLR), 2024

  49. arXiv:2404.08447  [pdf, other

    cs.LG math.OC

    Federated Optimization with Doubly Regularized Drift Correction

    Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

    Abstract: Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while kee** the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  50. arXiv:2404.06918  [pdf, other

    cs.CV

    HRVDA: High-Resolution Visual Document Assistant

    Authors: Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu

    Abstract: Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document understanding still leaves much room for improvement. This discrepancy is primarily attributed to the fact that visual document understanding is a fine-g… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 main conference