Skip to main content

Showing 1–50 of 269 results for author: Hu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, **g Sun, ** Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general pu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  3. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruo**g Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zai** Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. Causal Inference with Latent Variables: Recent Advances and Future Prospectives

    Authors: Yaochen Zhu, Yinhan He, **g Ma, Mengxuan Hu, Sheng Li, Jundong Li

    Abstract: Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD'24 Survey Track

  5. arXiv:2406.12784  [pdf, other

    cs.CL

    UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions

    Authors: Xunzhi Wang, Zhuowei Zhang, Qiongyu Li, Gaonan Chen, Mengting Hu, Zhiyu li, Bitong Luo, Hang Gao, Zhixin Han, Haotian Wang

    Abstract: The rapid development of large language models (LLMs) has shown promising practical results. However, their low interpretability often leads to errors in unforeseen circumstances, limiting their utility. Many works have focused on creating comprehensive evaluation systems, but previous benchmarks have primarily assessed problem-solving abilities while neglecting the response's uncertainty, which m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Under review

  6. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  7. arXiv:2406.11267  [pdf, other

    cs.CL

    Mitigating Large Language Model Hallucination with Faithful Finetuning

    Authors: Minda Hu, Bowei He, Yufei Wang, Liangyou Li, Chen Ma, Irwin King

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on various natural language processing tasks. However, they are prone to generating fluent yet untruthful responses, known as "hallucinations". Hallucinations can lead to the spread of misinformation and cause harm in critical applications. Mitigating hallucinations is challenging as they arise from factors such as noisy data, m… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11258  [pdf, other

    cs.CL

    Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization

    Authors: Minda Hu, Licheng Zong, Hongru Wang, **gyan Zhou, **g**g Li, Yichen Gao, Kam-Fai Wong, Yu Li, Irwin King

    Abstract: Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG). However, existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries, resulting in sub-optimal performance. To address these limitations, we propose a novel plug-and-play LL… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.10461  [pdf, ps, other

    cs.HC

    Exploring Parent-Child Perceptions on Safety in Generative AI: Concerns, Mitigation Strategies, and Design Implications

    Authors: Yaman Yu, Tanusree Sharma, Melinda Hu, Justin Wang, Yang Wang

    Abstract: The widespread use of Generative Artificial Intelligence (GAI) among teenagers has led to significant misuse and safety concerns. To identify risks and understand parental controls challenges, we conducted a content analysis on Reddit and interviewed 20 participants (seven teenagers and 13 parents). Our study reveals a significant gap in parental awareness of the extensive ways children use GAI, s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages

  10. arXiv:2406.09953  [pdf, other

    cs.RO cs.AI

    DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

    Authors: Zeyu Gao, Yao Mu, **ye Qu, Mengkang Hu, Lingyue Guo, ** Luo, Yanfeng Lu

    Abstract: Dual-arm robots offer enhanced versatility and efficiency over single-arm counterparts by enabling concurrent manipulation of multiple objects or cooperative execution of tasks using both arms. However, effectively coordinating the two arms for complex long-horizon tasks remains a significant challenge. Existing task planning methods predominantly focus on single-arm robots or rely on predefined b… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 46 pages, 13 figures

  11. Optimal Kernel Orchestration for Tensor Programs with Korch

    Authors: Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai

    Abstract: Kernel orchestration is the task of map** the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Fix some typos in the ASPLOS version

    Journal ref: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 3 (2024) 755-769

  12. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kai**g Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos are critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets for surgical workflow analysis, which typically face challenges such as small s… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Version 1

  13. arXiv:2406.07365  [pdf, other

    cs.CL cs.AI

    BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

    Authors: Yinhao Bai, Yalan Xie, Xiaoyi Liu, Yuhua Zhao, Zhixin Han, Mengting Hu, Hang Gao, Renhong Cheng

    Abstract: Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity. In practice, unseen aspects, due to distinct data distribution, impose many challenges for a trained neural model. Motivated by this, this work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real application… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Main Conference

  14. arXiv:2406.07230  [pdf, other

    cs.CV cs.AI

    Needle In A Multimodal Haystack

    Authors: Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, ** Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang

    Abstract: With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capab… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.06384  [pdf, other

    cs.CV

    Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

    Authors: Peng Xia, Ming Hu, Feilong Tang, Wenxue Li, Wenhao Zheng, Lie Ju, Peibo Duan, Huaxiu Yao, Zongyuan Ge

    Abstract: Diabetic Retinopathy (DR), induced by diabetes, poses a significant risk of visual impairment. Accurate and effective grading of DR aids in the treatment of this condition. Yet existing models experience notable performance degradation on unseen domains due to domain shifts. Previous methods address this issue by simulating domain style through simple visual transformation and mitigating domain no… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  16. arXiv:2406.06089  [pdf, other

    cs.CV

    Texture Re-scalable Universal Adversarial Perturbation

    Authors: Yihao Huang, Qing Guo, Felix Juefei-Xu, Ming Hu, Xiaojun Jia, Xiaochun Cao, Geguang Pu, Yang Liu

    Abstract: Universal adversarial perturbation (UAP), also known as image-agnostic perturbation, is a fixed perturbation map that can fool the classifier with high probabilities on arbitrary images, making it more practical for attacking deep models in the real world. Previous UAP methods generate a scale-fixed and texture-fixed perturbation map for all images, which ignores the multi-scale objects in images… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages (accepted by TIFS2024)

  17. arXiv:2405.14918  [pdf, other

    cs.LG cs.ET

    AnalogCoder: Analog Circuit Design via Training-Free Code Generation

    Authors: Yao Lai, Sungyoung Lee, Guo** Chen, Souradip Poddar, Mengkang Hu, David Z. Pan, ** Luo

    Abstract: Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality. Despite advances made by Large Language Models (LLMs) in digital circuit design, the complexity and scarcity of data in analog circuitry pose significant challenges. To mitigate these issues, we introduce AnalogCod… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.13349  [pdf, other

    cs.DC

    Building a Verifiable Logical Clock for P2P Networks

    Authors: Guangda Sun, Tianyang Tao, Yanpei Guo, Michael Yiqing Hu, Jialin Li

    Abstract: Logical clocks are a fundamental tool to establish causal ordering of events in a distributed system. They have been applied in weakly consistent storage systems, causally ordered broadcast, distributed snapshots, deadlock detection, and distributed system debugging. However, prior logical clock constructs fail to work in an open network with Byzantine participants. In this work, we present Chrono… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.11289  [pdf, other

    eess.IV cs.CV

    Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification

    Authors: Ming Hu, Siyuan Yan, Peng Xia, Feilong Tang, Wenxue Li, Peibo Duan, Lin Zhang, Zongyuan Ge

    Abstract: Deep learning-based diagnostic systems have demonstrated potential in skin disease diagnosis. However, their performance can easily degrade on test domains due to distribution shifts caused by input-level corruptions, such as imaging equipment variability, brightness changes, and image blur. This will reduce the reliability of model deployment in real-world scenarios. Most existing solutions focus… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  20. arXiv:2405.08573  [pdf, other

    cs.HC

    ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

    Authors: Shenji Zhu, Miaoxin Hu, Tianya Pan, Yue Hong, Bin Li, Zhiguang Zhou, Ting Xu

    Abstract: Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  21. arXiv:2405.08099  [pdf, other

    cs.CL

    KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

    Authors: Mengkang Hu, Haoyu Dong, ** Luo, Shi Han, Dongmei Zhang

    Abstract: Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to address the issue of external knowledge in TableQA or only utilize unstructured text as supplementary information for tables. In this paper, we propose… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: LREC-Coling 2024

  22. arXiv:2405.02791  [pdf, other

    cs.CV cs.AI

    Efficient Text-driven Motion Generation via Latent Consistency Training

    Authors: Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen

    Abstract: Motion diffusion models excel at text-driven motion generation but struggle with real-time inference since motion sequences are time-axis redundant and solving reverse diffusion trajectory involves tens or hundreds of sequential iterations. In this paper, we propose a Motion Latent Consistency Training (MLCT) framework, which allows for large-scale skip sampling of compact motion latent representa… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

  23. arXiv:2405.01844  [pdf, other

    cs.NI cs.CR cs.DC

    A Survey on Privacy-Preserving Caching at Network Edge: Classification, Solutions, and Challenges

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Shazia Riaz, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Caching content at the network edge is a popular and effective technique widely deployed to alleviate the burden of network backhaul, shorten service delay and improve service quality. However, there has been some controversy over privacy violations in caching content at the network edge. On the one hand, the multi-access open edge network provides an ideal surface for external attackers to obtain… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, **g Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jian** Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  25. Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

    Authors: Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen

    Abstract: In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies o… ▽ More

    Submitted 29 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: to appear at FSE 2024

  26. arXiv:2404.15946  [pdf

    cs.CV cs.AI eess.IV

    Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography

    Authors: Xuxin Chen, Yuheng Li, Mingzhe Hu, Ella Salari, Xiaoqian Chen, Richard L. J. Qiu, Bin Zheng, Xiaofeng Yang

    Abstract: Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-tr… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  27. arXiv:2404.15506  [pdf, other

    cs.CV

    Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

    Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

    Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

    Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. arXiv admin note: substantial text overlap with arXiv:2307.10984

  28. arXiv:2404.14061  [pdf, other

    cs.LG cs.AI cs.DB cs.SI

    FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning

    Authors: Yinlin Zhu, Xunkai Li, Zhengyu Wu, Di Wu, Miao Hu, Rong-Hua Li

    Abstract: Subgraph federated learning (subgraph-FL) is a new distributed paradigm that facilitates the collaborative training of graph neural networks (GNNs) by multi-client subgraphs. Unfortunately, a significant challenge of subgraph-FL arises from subgraph heterogeneity, which stems from node and topology variation, causing the impaired performance of the global GNN. Despite various studies, they have no… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  29. C2F-SemiCD: A Coarse-to-Fine Semi-Supervised Change Detection Method Based on Consistency Regularization in High-Resolution Remote Sensing Images

    Authors: Chengxi Han, Chen Wu, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: A high-precision feature extraction model is crucial for change detection (CD). In the past, many deep learning-based supervised CD methods learned to recognize change feature patterns from a large number of labelled bi-temporal images, whereas labelling bi-temporal remote sensing images is very expensive and often time-consuming; therefore, we propose a coarse-to-fine semi-supervised CD method ba… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  30. arXiv:2404.12850  [pdf, other

    cs.LG cs.DC

    CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Xiaofei Xie, Tianlin Li, Anran Li, Junlong Zhou, Mingsong Chen

    Abstract: Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL a… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  31. arXiv:2404.12846  [pdf, other

    cs.LG

    KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

    Authors: Zeke Xia, Ming Hu, Dengke Yan, Ruixuan Liu, Anran Li, Xiaofei Xie, Mingsong Chen

    Abstract: Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  32. arXiv:2404.12020  [pdf, other

    cs.CV

    Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

    Authors: Jie Ma, Min Hu, **hui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du

    Abstract: Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackl… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Under Review

    ACM Class: I.2.10

  33. Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  34. HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Hongruixuan Chen

    Abstract: Benefiting from the developments in deep learning technology, deep-learning-based algorithms employing automatic feature extraction have achieved remarkable performance on the change detection (CD) task. However, the performance of existing deep-learning-based CD methods is hindered by the imbalance between changed and unchanged pixels. To tackle this problem, a progressive foreground-balanced sam… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  35. arXiv:2404.08313  [pdf, other

    cs.CL cs.AI

    The Integration of Semantic and Structural Knowledge in Knowledge Graph Entity Ty**

    Authors: Muzhi Li, Minda Hu, Irwin King, Ho-fung Leung

    Abstract: The Knowledge Graph Entity Ty** (KGET) task aims to predict missing type annotations for entities in knowledge graphs. Recent works only utilize the \textit{\textbf{structural knowledge}} in the local neighborhood of entities, disregarding \textit{\textbf{semantic knowledge}} in the textual representations of entities, relations, and types that are also crucial for type inference. Additionally,… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted in NAACL2024 main

  36. arXiv:2404.06214  [pdf, other

    cs.CL

    [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

    Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

    Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  37. arXiv:2404.01892  [pdf, other

    cs.CV

    Minimize Quantization Output Error with Bias Compensation

    Authors: Cheng Gong, Haoshuai Zheng, Mengting Hu, Zheng Lin, Deng-** Fan, Yuzhi Zhang, Tao Li

    Abstract: Quantization is a promising method that reduces memory usage and computational intensity of Deep Neural Networks (DNNs), but it often leads to significant output error that hinder model deployment. In this paper, we propose Bias Compensation (BC) to minimize the output error, thus realizing ultra-low-precision quantization without model fine-tuning. Instead of optimizing the non-convex quantizatio… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

    Journal ref: CAAI Artificial Intelligence Research, 2024

  38. Versatile LiDAR-Inertial Odometry With SE (2) Constraints for Ground Vehicles

    Authors: Jiaying Chen, Han Wang, Minghui Hu, Ponnuthurai Nagaratnam Suganthan

    Abstract: LiDAR SLAM has become one of the major localization systems for ground vehicles since LiDAR Odometry And Map** (LOAM). Many extension works on LOAM mainly leverage one specific constraint to improve the performance, e.g., information from on-board sensors such as loop closure and inertial state; prior conditions such as ground level and motion dynamics. In many robotic applications, these condit… ▽ More

    Submitted 23 December, 2023; originally announced April 2024.

    Journal ref: IEEE Robotics and Automation Letters 2023

  39. arXiv:2404.01101  [pdf, other

    cs.CR cs.CV cs.LG

    UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models

    Authors: Zihan Guan, Mengxuan Hu, Sheng Li, Anil Vullikanti

    Abstract: Diffusion Models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning some parts of the training samples during the training stage. This poses a serious threat to the downstream users, who query the diffusion models through the API or directly download them from the internet. To mitigate the threat of backdoor attacks, there have been a plethora of investigat… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 20 pages,18 figures

  40. arXiv:2404.01024  [pdf, other

    cs.CV eess.IV

    AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

    Authors: Liu Yang, Huiyu Duan, Long Teng, Yucheng Zhu, Xiaohong Liu, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

    Abstract: In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distorti… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  41. arXiv:2403.19584  [pdf, other

    cs.CV cs.AI

    Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation

    Authors: Zhongliang Zhou, Jielu Zhang, Zihan Guan, Mengxuan Hu, Ni Lao, Lan Mu, Sheng Li, Gengchen Mai

    Abstract: Geolocating precise locations from images presents a challenging problem in computer vision and information retrieval.Traditional methods typically employ either classification, which dividing the Earth surface into grid cells and classifying images accordingly, or retrieval, which identifying locations by matching images with a database of image-location pairs. However, classification-based appro… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  42. arXiv:2403.17881  [pdf, other

    cs.CV

    Deepfake Generation and Detection: A Benchmark and Survey

    Authors: Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, Dacheng Tao

    Abstract: Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few. With the advancements in deep learning, techniques primarily represented by Variational Autoencoders and Generative Adversarial Networks have achieved… ▽ More

    Submitted 16 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: We closely follow the latest developments in https://github.com/flyingby/Awesome-Deepfake-Generation-and-Detection

  43. arXiv:2403.17532  [pdf, other

    cs.AI

    KC-GenRe: A Knowledge-constrained Generative Re-ranking Method Based on Large Language Models for Knowledge Graph Completion

    Authors: Yilin Wang, Minghao Hu, Zhen Huang, Dongsheng Li, Dong Yang, Xicheng Lu

    Abstract: The goal of knowledge graph completion (KGC) is to predict missing facts among entities. Previous methods for KGC re-ranking are mostly built on non-generative language models to obtain the probability of each candidate. Recently, generative large language models (LLMs) have shown outstanding performance on several tasks such as information extraction and dialog systems. Leveraging them for KGC re… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication in the proceedings of LREC-COLING 2024

  44. arXiv:2403.12013  [pdf, other

    cs.CV

    GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

    Authors: Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, ** Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long

    Abstract: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenar… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://fuxiao0719.github.io/projects/geowizard/

  45. arXiv:2403.10133  [pdf, other

    cs.CV

    E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

    Authors: Tianrui Huang, Pu Cao, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song

    Abstract: Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. While current editing approaches have made improvements under text guidance, most of them have only focused on preserving the information of the input image, disregarding the importance of editability and alignment to the target prompt. In this paper, we… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  46. arXiv:2403.07901  [pdf, other

    cs.CV cs.LG

    MIP: CLIP-based Image Reconstruction from PEFT Gradients

    Authors: Peiheng Zhou, Ming Hu, Xiaofei Xie, Yihao Huang, Kangjie Chen, Mingsong Chen

    Abstract: Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

  47. Thought Graph: Generating Thought Process for Biological Reasoning

    Authors: Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, Ying Ding

    Abstract: We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 4 pages. Accepted by Web Conf 2024

  48. arXiv:2403.05478  [pdf, other

    cs.RO

    HGIC: A Hand Gesture Based Interactive Control System for Efficient and Scalable Multi-UAV Operations

    Authors: Mengsha Hu, **zhou Li, Runxiang **, Chao Shi, Lei Xu, Rui Liu

    Abstract: As technological advancements continue to expand the capabilities of multi unmanned-aerial-vehicle systems (mUAV), human operators face challenges in scalability and efficiency due to the complex cognitive load and operations associated with motion adjustments and team coordination. Such cognitive demands limit the feasible size of mUAV teams and necessitate extensive operator training, impeding b… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  49. arXiv:2403.05472  [pdf, other

    cs.RO

    Federated Joint Learning of Robot Networks in Stroke Rehabilitation

    Authors: Xinyu Jiang, Yibei Guo, Mengsha Hu, Ruoming **, Hai Phan, Jay Alberts, Rui Liu

    Abstract: Advanced by rich perception and precise execution, robots possess immense potential to provide professional and customized rehabilitation exercises for patients with mobility impairments caused by strokes. Autonomous robotic rehabilitation significantly reduces human workloads in the long and tedious rehabilitation process. However, training a rehabilitation robot is challenging due to the data sc… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  50. arXiv:2403.03535  [pdf, other

    cs.CV cs.LG

    Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

    Authors: Minyang Hu, Hong Chang, Zong Guo, Bingpeng Ma, Shiguan Shan, Xilin Chen

    Abstract: Few-shot learning (FSL) aims to learn novel tasks with very few labeled samples by leveraging experience from \emph{related} training tasks. In this paper, we try to understand FSL by delving into two key questions: (1) How to quantify the relationship between \emph{training} and \emph{novel} tasks? (2) How does the relationship affect the \emph{adaptation difficulty} on novel tasks for different… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.