Skip to main content

Showing 1–50 of 281 results for author: Fan, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01312  [pdf, other

    cs.CV

    ToCoAD: Two-Stage Contrastive Learning for Industrial Anomaly Detection

    Authors: Yun Liang, Zhiguang Hu, Junjie Huang, Donglin Di, Anyang Su, Lei Fan

    Abstract: Current unsupervised anomaly detection approaches perform well on public datasets but struggle with specific anomaly types due to the domain gap between pre-trained feature extractors and target-specific domains. To tackle this issue, this paper presents a two-stage training strategy, called \textbf{ToCoAD}. In the first stage, a discriminative network is trained by using synthetic anomalies in a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  2. arXiv:2406.19633  [pdf, other

    cs.SE

    Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

    Authors: Shengnan Wu, Yongxiang Hu, Yingchuan Wang, Jiazhen Gu, ** Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou

    Abstract: Search components in e-commerce apps, often complex AI-based systems, are prone to bugs that can lead to missed recalls - situations where items that should be listed in search results aren't. This can frustrate shop owners and harm the app's profitability. However, testing for missed recalls is challenging due to difficulties in generating user-aligned test cases and the absence of oracles. In th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion '24), July 15--19, 2024, Porto de Galinhas, Brazil

  3. arXiv:2406.15050  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis

    Authors: Lin Fan, Xun Gong, Cenyang Zheng, Yafei Ou

    Abstract: The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answ… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    ACM Class: I.2.7; I.2.10; J.3

  4. arXiv:2406.14086  [pdf

    cs.CV cs.AI cs.LG

    Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images

    Authors: Qinfeng Zhu, Yuanzhi Cai, Lei Fan

    Abstract: Recent advancements in autoregressive networks with linear complexity have driven significant research progress, demonstrating exceptional performance in large language models. A representative model is the Extended Long Short-Term Memory (xLSTM), which incorporates gating mechanisms and memory structures, performing comparably to Transformer architectures in long-sequence language tasks. Autoregr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.13301  [pdf, other

    cs.CV cs.RO

    ARDuP: Active Region Video Diffusion for Universal Policies

    Authors: Shuaiyi Huang, Mara Levy, Zhenyu Jiang, Anima Anandkumar, Yuke Zhu, Linxi Fan, De-An Huang, Abhinav Shrivastava

    Abstract: Sequential decision-making can be formulated as a text-conditioned video generation problem, where a video planner, guided by a text-defined goal, generates future frames visualizing planned actions, from which control actions are subsequently derived. In this work, we introduce Active Region Video Diffusion for Universal Policies (ARDuP), a novel framework for video-based policy learning that emp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12403  [pdf, other

    cs.CL cs.AI

    PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

    Authors: Tao Fan, Yan Kang, Wei**g Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.10700  [pdf, other

    cs.CV cs.RO

    Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

    Authors: Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxiang Zhang, Lei Zhang

    Abstract: Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based me… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures

  8. arXiv:2406.10569  [pdf, other

    cs.LG cs.CV

    MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise

    Authors: Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong

    Abstract: Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researcher… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    ACM Class: I.5.2; I.2.7; I.2.10; J.3

  9. arXiv:2406.08481  [pdf, other

    cs.CV

    Enhancing End-to-End Autonomous Driving with Latent World Model

    Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan

    Abstract: End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.07499  [pdf, other

    cs.CV cs.GR

    Trim 3D Gaussian Splatting for Accurate Geometry Representation

    Authors: Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang

    Abstract: In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while pre… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project page: https://trimgs.github.io/

  11. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xi** Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures, add citations

  12. arXiv:2406.02787  [pdf, other

    cs.CL cs.AI cs.LG

    Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

    Authors: Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Shuhang Lin, Mingyu **, Haochen Xue, Zelong Li, **Dong Wang, Yongfeng Zhang

    Abstract: This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abs… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  13. arXiv:2406.02224  [pdf, other

    cs.CL cs.AI

    FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

    Authors: Tao Fan, Guoqiang Ma, Yan Kang, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bri… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2406.01967  [pdf, other

    cs.RO cs.AI cs.LG

    DrEureka: Language Model Guided Sim-To-Real Transfer

    Authors: Yecheng Jason Ma, William Liang, Hung-Ju Wang, Sam Wang, Yuke Zhu, Linxi Fan, Osbert Bastani, Dinesh Jayaraman

    Abstract: Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automa… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Robotics: Science and Systems (RSS) 2024. Project website and open-source code: https://eureka-research.github.io/dr-eureka/

  15. arXiv:2406.01085  [pdf, other

    cs.CR cs.AI

    FedAdOb: Privacy-Preserving Federated Deep Learning with Adaptive Obfuscation

    Authors: Hanlin Gu, Jiahuan Luo, Yan Kang, Yuan Yao, Gongxi Zhu, Bowen Li, Lixin Fan, Qiang Yang

    Abstract: Federated learning (FL) has emerged as a collaborative approach that allows multiple clients to jointly learn a machine learning model without sharing their private data. The concern about privacy leakage, albeit demonstrated under specific conditions, has triggered numerous follow-up research in designing powerful attacking methods and effective defending mechanisms aiming to thwart these attacki… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  16. arXiv:2405.20681  [pdf, other

    cs.CR cs.AI

    No Free Lunch Theorem for Privacy-Preserving LLM Inference

    Authors: Xiao** Zhang, Yulin Fei, Yan Kang, Wei Chen, Lixin Fan, Hai **, Qiang Yang

    Abstract: Individuals and businesses have been significantly benefited by Large Language Models (LLMs) including PaLM, Gemini and ChatGPT in various ways. For example, LLMs enhance productivity, reduce costs, and enable us to focus on more valuable tasks. Furthermore, LLMs possess the capacity to sift through extensive datasets, uncover underlying patterns, and furnish critical insights that propel the fron… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  17. arXiv:2405.17462  [pdf, other

    cs.LG

    Ferrari: Federated Feature Unlearning via Optimizing Feature Sensitivity

    Authors: Hanlin Gu, WinKent Ong, Chee Seng Chan, Lixin Fan

    Abstract: The advent of Federated Learning (FL) highlights the practical necessity for the 'right to be forgotten' for all clients, allowing them to request data deletion from the machine learning model's service provider. This necessity has spurred a growing demand for Federated Unlearning (FU). Feature unlearning has gained considerable attention due to its applications in unlearning sensitive features, b… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: TLDR: The need for a "right to be forgotten" in Federated Learning has led to the development of the Ferrari framework, which efficiently unlearns sensitive features using a Lipschitz continuity-based metric, proven effective in extensive testing

  18. arXiv:2405.15474  [pdf, other

    cs.LG cs.DC

    Unlearning during Learning: An Efficient Federated Machine Unlearning Method

    Authors: Hanlin Gu, Gongxi Zhu, Jie Zhang, Xinyuan Zhao, Yuxing Han, Lixin Fan, Qiang Yang

    Abstract: In recent years, Federated Learning (FL) has garnered significant attention as a distributed machine learning paradigm. To facilitate the implementation of the right to be forgotten, the concept of federated machine unlearning (FMU) has also emerged. However, current FMU approaches often involve additional time-consuming steps and may not offer comprehensive unlearning capabilities, which renders… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  19. arXiv:2405.14212  [pdf, other

    cs.CR cs.CL

    Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

    Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang

    Abstract: As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utiliz… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  20. arXiv:2405.13426  [pdf

    cs.HC cs.AI

    A New Era in Human Factors Engineering: A Survey of the Applications and Prospects of Large Multimodal Models

    Authors: Li Fan, Lee Ching-Hung, Han Su, Feng Shanshan, Jiang Zhuoxuan, Sun Zhu

    Abstract: In recent years, the potential applications of Large Multimodal Models (LMMs) in fields such as healthcare, social psychology, and industrial design have attracted wide research attention, providing new directions for human factors research. For instance, LMM-based smart systems have become novel research subjects of human factors studies, and LMM introduces new research paradigms and methodologie… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, journal paper

  21. arXiv:2405.11841  [pdf, other

    cs.AI

    Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities

    Authors: Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng Han, Yujia Peng, Yixin Zhu, Lifeng Fan

    Abstract: Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; Bubeck et al., 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023), the current study introduces a benchmark for evaluating social intelligence, one of the most distinctive aspects of human cognition. We developed a comprehensive theoretical framework for s… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Also published in Proceedings of the Annual Meeting of the Cognitive Science Society (CogSci), 2024

  22. arXiv:2405.11762  [pdf

    cs.LG

    Interpretability of Statistical, Machine Learning, and Deep Learning Models for Landslide Susceptibility Map** in Three Gorges Reservoir Area

    Authors: Cheng Chen, Lei Fan

    Abstract: Landslide susceptibility map** (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set o… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  23. arXiv:2405.08493  [pdf

    cs.CV

    Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

    Authors: Qinfeng Zhu, Yuan Fang, Yuanzhi Cai, Cheng Chen, Lei Fan

    Abstract: Deep learning methods, especially Convolutional Neural Networks (CNN) and Vision Transformer (ViT), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global re… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  24. arXiv:2405.03066  [pdf

    cs.ET

    A sco** review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)

    Authors: Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yongfeng Zhang, Themistocles L. Assimes, Libby Hemphill, Siyuan Ma

    Abstract: Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrat… ▽ More

    Submitted 22 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  25. arXiv:2405.02364  [pdf, other

    cs.LG cs.DC

    A Survey on Contribution Evaluation in Vertical Federated Learning

    Authors: Yue Cui, Chung-ju Huang, Yuzhu Zhang, Leye Wang, Lixin Fan, Xiaofang Zhou, Qiang Yang

    Abstract: Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and ac… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  26. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu **, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  27. arXiv:2404.12273  [pdf, other

    cs.AI cs.CL cs.LG

    FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom

    Authors: Yuanqin He, Yan Kang, Lixin Fan, Qiang Yang

    Abstract: Federated Learning (FL) has emerged as a promising solution for collaborative training of large language models (LLMs). However, the integration of LLMs into FL introduces new challenges, particularly concerning the evaluation of LLMs. Traditional evaluation methods that rely on labeled test sets and similarity-based metrics cover only a subset of the acceptable answers, thereby failing to accurat… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: In Progress

  28. arXiv:2404.09001  [pdf, other

    cs.RO cs.AI cs.CV

    Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households

    Authors: Zhihao Cao, Zidong Wang, Siwen Xie, Anji Liu, Lifeng Fan

    Abstract: Despite the significant demand for assistive technology among vulnerable groups (e.g., the elderly, children, and the disabled) in daily tasks, research into advanced AI-driven assistive solutions that genuinely accommodate their diverse needs remains sparse. Traditional human-machine interaction tasks often require machines to simply help without nuanced consideration of human abilities and feeli… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  29. arXiv:2404.04490  [pdf, other

    cs.LG cs.CR

    Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning

    Authors: Yan Kang, Ziyao Ren, Lixin Fan, Linghua Yang, Yongxin Tong, Qiang Yang

    Abstract: SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  30. arXiv:2404.01705  [pdf

    cs.CV

    Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

    Authors: Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen

    Abstract: High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to… ▽ More

    Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  31. arXiv:2404.01133  [pdf, other

    cs.CV

    CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

    Authors: Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Junran Peng, Zhaoxiang Zhang

    Abstract: The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strate… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project Page: https://dekuliutesla.github.io/citygs/

  32. arXiv:2403.16303  [pdf

    cs.DL cs.AI cs.CL cs.SI

    Large Language Models in Biomedical and Health Informatics: A Bibliometric Review

    Authors: Huizi Yu, Lizhou Fan, Lingyao Li, Jiayan Zhou, Zihui Ma, Lu Xian, Wenyue Hua, Sijia He, Mingyu **, Yongfeng Zhang, Ashvin Gandhi, Xin Ma

    Abstract: Large Language Models (LLMs) have rapidly become important tools in Biomedical and Health Informatics (BHI), enabling new ways to analyze data, treat patients, and conduct research. This bibliometric review aims to provide a panoramic view of how LLMs have been used in BHI by examining research articles and collaboration networks from 2022 to 2023. It further explores how LLMs can improve Natural… ▽ More

    Submitted 23 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: 51 pages, 7 figures, 4 tables

  33. arXiv:2403.12800  [pdf, other

    cs.CV

    Learning Neural Volumetric Pose Features for Camera Localization

    Authors: **gyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jie** Ye

    Abstract: We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  34. arXiv:2403.06510  [pdf, other

    cs.CV

    Skeleton Supervised Airway Segmentation

    Authors: Mingyue Zhao, Han Li, Li Fan, Shiyuan Liu, Xiaolan Qiu, S. Kevin Zhou

    Abstract: Fully-supervised airway segmentation has accomplished significant triumphs over the years in aiding pre-operative diagnosis and intra-operative navigation. However, full voxel-level annotation constitutes a labor-intensive and time-consuming task, often plagued by issues such as missing branches, branch annotation discontinuity, or erroneous edge delineation. label-efficient solutions for airway e… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.01826  [pdf, other

    cs.CE

    A Novel Shortest Path Query Algorithm Based on Optimized Adaptive Topology Structure

    Authors: Xiao Fang, Xuyang Song, Jiyuan Ma, Guanhua Liu, Shurong Pang, Wenbo Zhao, Cong Cao, Ling Fan

    Abstract: Urban rail transit is a fundamental component of public transportation, however, commonly station-based path search algorithms often overlook the impact of transfer times on search results, leading to decreased accuracy. To solve this problem, this paper proposes a novel shortest path query algorithm based on adaptive topology optimization called the Adaptive Topology Extension Road Network Struct… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  36. arXiv:2403.01777  [pdf, other

    cs.CL cs.CV

    NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

    Authors: Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu **, Lingyao Li, Haoyang Ling, **kui Chi, **dong Wang, Xin Ma, Yongfeng Zhang

    Abstract: Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is an important area of research. In this study, we introduce a dynamic benchmark, NPHardEval4V, aimed at addressing the existing gaps in evaluating the pure reasoning abilities of MLLMs. Our benchmark aims to provide a venue to disentangle the effect of various factors such as image recognition and instruction fo… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures, 2 tables

  37. arXiv:2402.14609  [pdf, other

    cs.LG cs.AI cs.CR cs.DB

    FedCQA: Answering Complex Queries on Multi-Source Knowledge Graphs via Federated Learning

    Authors: Qi Hu, Weifeng Jiang, Haoran Li, Zihao Wang, Jiaxin Bai, Qianren Mao, Yangqiu Song, Lixin Fan, Jianxin Li

    Abstract: Complex logical query answering is a challenging task in knowledge graphs (KGs) that has been widely studied. The ability to perform complex logical reasoning is essential and supports various graph reasoning-based downstream tasks, such as search engines. Recent approaches are proposed to represent KG entities and logical queries into embedding vectors and find answers to logical queries from the… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  38. arXiv:2402.06289  [pdf, other

    cs.LG cs.CR

    Evaluating Membership Inference Attacks and Defenses in Federated Learning

    Authors: Gongxi Zhu, Donghao Li, Hanlin Gu, Yuxing Han, Yuan Yao, Lixin Fan, Qiang Yang

    Abstract: Membership Inference Attacks (MIAs) pose a growing threat to privacy preservation in federated learning. The semi-honest attacker, e.g., the server, may determine whether a particular sample belongs to a target client according to the observed model information. This paper conducts an evaluation of existing MIAs and corresponding defense strategies. Our evaluation on MIAs reveals two important fin… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 11 pages, 4 figures

  39. arXiv:2402.00746  [pdf, other

    cs.CL

    Health-LLM: Personalized Retrieval-Augmented Disease Prediction System

    Authors: Mingyu **, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang

    Abstract: Recent advancements in artificial intelligence (AI), especially large language models (LLMs), have significantly advanced healthcare applications and demonstrated potentials in intelligent medical treatment. However, there are conspicuous challenges such as vast data volumes and inconsistent symptom characterization standards, preventing full integration of healthcare AI systems with individual pa… ▽ More

    Submitted 19 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  40. arXiv:2401.17857  [pdf, other

    cs.CV

    SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition

    Authors: Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang

    Abstract: 3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we… ▽ More

    Submitted 17 May, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  41. arXiv:2401.16305  [pdf, other

    cs.CV cs.RO

    MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection

    Authors: Yuxue Yang, Lue Fan, Zhaoxiang Zhang

    Abstract: Label-efficient LiDAR-based 3D object detection is currently dominated by weakly/semi-supervised methods. Instead of exclusively following one of them, we propose MixSup, a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. We start by observing that point clouds are usually textureless, making it hard… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: ICLR 2024, code is available at https://github.com/BraveGroup/PointSAM-for-MixSup

  42. Reconstructing Close Human Interactions from Multiple Views

    Authors: Qing Shuai, Zhiyuan Yu, Zhize Zhou, Lixin Fan, Haijun Yang, Can Yang, Xiaowei Zhou

    Abstract: This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. The difficulty arises from the noisy or false 2D keypoint detections due to inter-person occlusion, the heavy ambiguity in associating keypoints to individuals due to the close interactions, and the scarcity of training data as collec… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: SIGGRAPH Asia 2023

    Journal ref: ACM Transactions on Graphics 2023

  43. An annotated grain kernel image database for visual quality inspection

    Authors: Lei Fan, Yiwen Ding, Dongdong Fan, Yong Wu, Hongxia Chu, Maurice Pagnucco, Yang Song

    Abstract: We present a machine vision-based database named GrainSet for the purpose of visual quality inspection of grain kernels. The database contains more than 350K single-kernel images with experts' annotations. The grain kernels used in the study consist of four types of cereal grains including wheat, maize, sorghum and rice, and were collected from over 20 regions in 5 countries. The surface informati… ▽ More

    Submitted 20 November, 2023; originally announced January 2024.

    Comments: Accepted by Nature Scientific Data (2023), https://github.com/hellodfan/GrainSet

  44. arXiv:2401.08552  [pdf, other

    cs.LG cs.AI

    Explaining Time Series via Contrastive and Locally Sparse Perturbations

    Authors: Zichuan Liu, Yingying Zhang, Tianchun Wang, Zefan Wang, Dongsheng Luo, Mengnan Du, Min Wu, Yi Wang, Chunlin Chen, Lunting Fan, Qingsong Wen

    Abstract: Explaining multivariate time series is a compound challenge, as it requires identifying important locations in the time series and matching complex temporal patterns. Although previous saliency-based methods addressed the challenges, their perturbation may not alleviate the distribution shift issue, which is inevitable especially in heterogeneous samples. We present ContraLSP, a locally sparse mod… ▽ More

    Submitted 28 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

  45. arXiv:2401.07331  [pdf, other

    cs.CE

    Rapid Estimation of Left Ventricular Contractility with a Physics-Informed Neural Network Inverse Modeling Approach

    Authors: Ehsan Naghavi, Haifeng Wang, Lei Fan, Jenny S. Choy, Ghassan Kassab, Seungik Baek, Lik-Chuan Lee

    Abstract: Physics-based computer models based on numerical solution of the governing equations generally cannot make rapid predictions, which in turn, limits their applications in the clinic. To address this issue, we developed a physics-informed neural network (PINN) model that encodes the physics of a closed-loop blood circulation system embedding a left ventricle (LV). The PINN model is trained to satisf… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  46. arXiv:2312.17742  [pdf, other

    cs.CV

    Learning Vision from Models Rivals Learning Vision from Data

    Authors: Yonglong Tian, Lijie Fan, Kaifeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola

    Abstract: We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions, without any real data. We synthesize a large dataset of image captions using LLMs, then use an off-the-shelf text-to-image model to generate multiple images corresponding to each synthetic caption. We perform visual representation learning on these synthetic images vi… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/google-research/syn-rep-learn

  47. arXiv:2312.17407  [pdf

    cs.CV eess.IV

    Comparing roughness maps generated by five roughness descriptors for LiDAR-derived digital elevation models

    Authors: Lei Fan, Yang Zhao

    Abstract: Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts o… ▽ More

    Submitted 13 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures

  48. arXiv:2312.16554  [pdf, other

    cs.LG cs.AI

    A Theoretical Analysis of Efficiency Constrained Utility-Privacy Bi-Objective Optimization in Federated Learning

    Authors: Hanlin Gu, Xinyuan Zhao, Gongxi Zhu, Yuxing Han, Yan Kang, Lixin Fan, Qiang Yang

    Abstract: Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency.… ▽ More

    Submitted 29 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  49. arXiv:2312.15808  [pdf, other

    cs.NI

    Quantum-Assisted Online Task Offloading and Resource Allocation in MEC-Enabled Satellite-Aerial-Terrestrial Integrated Networks

    Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo

    Abstract: In the era of Internet of Things (IoT), multi-access edge computing (MEC)-enabled satellite-aerial-terrestrial integrated network (SATIN) has emerged as a promising technology to provide massive IoT devices with seamless and reliable communication and computation services. This paper investigates the cooperation of low Earth orbit (LEO) satellites, high altitude platforms (HAPs), and terrestrial b… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  50. arXiv:2312.14890  [pdf, other

    cs.AI cs.CC cs.CL cs.LG

    NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

    Authors: Lizhou Fan, Wenyue Hua, Lingyao Li, Haoyang Ling, Yongfeng Zhang

    Abstract: Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequ… ▽ More

    Submitted 12 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 23 pages, 7 figures, 2 tables