Skip to main content

Showing 1–50 of 614 results for author: Shen, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  2. arXiv:2407.00132  [pdf, other

    cs.SE cs.AI

    ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

    Authors: Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma

    Abstract: Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest in both academia and industry. These API-based agents, leveraging the strong autonomy and planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, dive… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  4. arXiv:2406.13375  [pdf, other

    cs.CL

    ALiiCE: Evaluating Positional Fine-grained Citation Generation

    Authors: Yilong Xu, **hua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng

    Abstract: Large Language Models (LLMs) can enhance the credibility and verifiability by generating text with citations. However, existing tasks and evaluation methods are predominantly limited to sentence-level statement, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the fine-grained citation generation, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.11263  [pdf, other

    cs.CL cs.AI

    The Fall of ROME: Understanding the Collapse of LLMs in Model Editing

    Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

    Abstract: Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that con… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  7. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.06305  [pdf, other

    cs.CV cs.AI

    NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks

    Authors: Yuqi Ma, Huamin Wang, Hangchi Shen, Xuemei Chen, Shukai Duan, Shi** Wen

    Abstract: Recently, brain-inspired spiking neural networks (SNNs) have attracted great research attention owing to their inherent bio-interpretability, event-triggered properties and powerful perception of spatiotemporal information, which is beneficial to handling event-based neuromorphic datasets. In contrast to conventional static image datasets, event-based neuromorphic datasets present heightened compl… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages,4 figures,4 tables

  9. arXiv:2406.05271  [pdf, other

    cs.CV

    USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

    Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

    Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.03888  [pdf, ps, other

    cs.IT eess.SP

    MSE-Based Training and Transmission Optimization for MIMO ISAC Systems

    Authors: Zhenyao He, Wei Xu, Hong Shen, Yonina C. Eldar, Xiaohu You

    Abstract: In this paper, we investigate a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system under typical block-fading channels. As a non-trivial extension to most existing works on ISAC, both the training and transmission signals sent by the ISAC transmitter are exploited for sensing. Specifically, we develop two training and transmission design schemes to minimize a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.00944  [pdf, other

    cs.CL cs.AI cs.IR

    Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

    Authors: Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 23 pages

  12. arXiv:2405.20071  [pdf

    physics.med-ph cs.LG

    A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture

    Authors: Anjum Shaik, Kristoffer Larsen, Nancy E. Lane, Chen Zhao, Kuan-Jui Su, Joyce H. Keyak, Qing Tian, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou

    Abstract: Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 29 pages, 5 figures, 6 tables

  13. arXiv:2405.19660  [pdf, other

    cs.CL

    PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

    Authors: Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen

    Abstract: Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cogniti… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  14. Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learning

    Authors: Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng

    Abstract: Graph contrastive learning (GCL), standing as the dominant paradigm in the realm of graph pre-training, has yielded considerable progress. Nonetheless, its capacity for out-of-distribution (OOD) generalization has been relatively underexplored. In this work, we point out that the traditional optimization of InfoNCE in GCL restricts the cross-domain pairs only to be negative samples, which inevitab… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures, In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

    ACM Class: I.2

  15. arXiv:2405.16038  [pdf, other

    cs.CV

    Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

    Authors: Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen

    Abstract: Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, w… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  16. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, **gkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  17. arXiv:2405.15349  [pdf, other

    cs.CL

    UnKE: Unstructured Knowledge Editing in Large Language Models

    Authors: **gcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.13675  [pdf, other

    cs.CV

    Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

    Authors: Zhu Yu, Runming Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Siyuan Cao, Huiliang Shen

    Abstract: Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks. Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images, which fails to capture distinctions among them as the focal regions of different inputs vary and may result in undirected feature aggregation of… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.12710  [pdf, other

    cs.CV

    Text-Video Retrieval with Global-Local Semantic Consistent Learning

    Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, **gkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

    Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 9 pages

  20. arXiv:2405.12669  [pdf, other

    cs.CL

    A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

    Authors: Huangjun Shen, Liangying Shao, Wenbo Li, Zhibin Lan, Zhanyu Liu, **song Su

    Abstract: In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual modalities as inputs, leveraging visual context to tackle the ambiguities in source texts. In this paper, we begin by offering an exhaustive overview of 99 prior works, comprehensively summarizing representative studies… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.12247  [pdf, other

    cs.CV

    Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Hao Shen, Zhao Zhang

    Abstract: In real-world applications of human pose estimation, low-resolution input images are frequently encountered when the performance of the image acquisition equipment is limited or the shooting distance is too far. However, existing state-of-the-art models for human pose estimation perform poorly on low-resolution images. One key reason is the presence of downsampling layers in these models, e.g., st… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, conference

  22. arXiv:2405.11448  [pdf, other

    cs.CV

    Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Henghui Ding, Hao Shen, Zhao Zhang, De-Shuang Huang

    Abstract: In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowled… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  23. arXiv:2405.01266  [pdf, other

    cs.RO cs.AI

    MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

    Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dong** Liao, Guofa Li, Chengzhong Xu

    Abstract: This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph conv… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  24. arXiv:2405.01202  [pdf, other

    cs.SE cs.CR

    DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

    Authors: Yan**g Yang, Xin Zhou, Runfeng Mao, **wei Xu, Lanxin Yang, Yu Zhangm, Haifeng Shen, He Zhang

    Abstract: Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the bla… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures

  25. arXiv:2405.00614  [pdf, other

    cs.LG

    Multigroup Robustness

    Authors: Lunjia Hu, Charlotte Peale, Judy Hanwen Shen

    Abstract: To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data corruption that are localized to specific partitions of the training dataset. Motivated by critical applications where the learned model is deployed to make predictions… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  26. arXiv:2404.18587  [pdf, ps, other

    cs.IT

    Unlocking Potentials of Near-Field Propagation: ELAA-Empowered Integrated Sensing and Communication

    Authors: Zhenyao He, Wei Xu, Zhaohui Yang, Hong Shen, Ningning Fu, Yongming Huang, Zhaoyang Zhang, Xiaohu You

    Abstract: The exploration of extremely large antenna arrays (ELAAs) using high-frequency spectrum has led to a paradigm shift in electromagnetic radiation field, transitioning from the common use case of far-field propagation to near-field propagation. This shift necessitates the modification of the conventional planar-wavefront approximation to more accurate spherical waves, exerting a profound impact on w… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  27. arXiv:2404.17287  [pdf, other

    cs.CL

    When to Trust LLMs: Aligning Confidence with Response Quality

    Authors: Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, **yang Gao, Huawei Shen, Bolin Ding

    Abstract: Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective… ▽ More

    Submitted 9 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024

  28. arXiv:2404.14042  [pdf, other

    cs.CV

    CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction

    Authors: Wenhao Lan, Yijun Yang, Haihua Shen, Shan Li

    Abstract: The increasing adoption of 3D point cloud data in various applications, such as autonomous vehicles, robotics, and virtual reality, has brought about significant advancements in object recognition and scene understanding. However, this progress is accompanied by new security challenges, particularly in the form of backdoor attacks. These attacks involve inserting malicious information into the tra… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  29. arXiv:2404.11597  [pdf

    cs.AI cs.LG

    Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review

    Authors: Ahmed Maged, Salah Haridy, Herman Shen

    Abstract: As the manufacturing industry advances with sensor integration and automation, the opaque nature of deep learning models in machine learning poses a significant challenge for fault detection and diagnosis. And despite the related predictive insights Artificial Intelligence (AI) can deliver, advanced machine learning engines often remain a black box. This paper reviews the eXplainable AI (XAI) tool… ▽ More

    Submitted 10 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  30. arXiv:2404.09043  [pdf, other

    cs.CL

    Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

    Authors: Jia Gu, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: With the rapid advancement of large language models (LLMs) for handling complex language tasks, an increasing number of studies are employing LLMs as agents to emulate the sequential decision-making processes of humans often represented as Markov decision-making processes (MDPs). The actions in MDPs adhere to specific probability distributions and require iterative sampling. This arouses curiosity… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  31. arXiv:2404.07721  [pdf, other

    eess.SP cs.IT

    Trainable Joint Channel Estimation, Detection and Decoding for MIMO URLLC Systems

    Authors: Yi Sun, Hong Shen, Bingqing Li, Wei Xu, Pengcheng Zhu, Nan Hu, Chunming Zhao

    Abstract: The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages, 12 figures, accepted by IEEE Transactions on Wireless Communications

  32. arXiv:2404.04990  [pdf, other

    cs.CL

    MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models

    Authors: Zihao Wei, **gcheng Deng, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  33. arXiv:2404.00349  [pdf, other

    cs.CV

    SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

    Authors: Runmin Zhang, Zhu Yu, Zehua Sheng, Jiacheng Ying, Si-Yuan Cao, Shu-Jie Chen, Bailin Yang, Junwei Li, Hui-Liang Shen

    Abstract: Cross-spectral image guided denoising has shown its great potential in recovering clean images with rich details, such as using the near-infrared image to guide the denoising process of the visible one. To obtain such image pairs, a feasible and economical way is to employ a stereo system, which is widely used on mobile devices. Current works attempt to generate an aligned guidance image to handle… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  34. arXiv:2403.19996  [pdf, other

    cs.LG eess.SP

    DeepHeteroIoT: Deep Local and Global Learning over Heterogeneous IoT Sensor Data

    Authors: Muhammad Sakib Khan Inan, Kewen Liao, Haifeng Shen, Prem Prakash Jayaraman, Dimitrios Georgakopoulos, Ming Jian Tang

    Abstract: Internet of Things (IoT) sensor data or readings evince variations in timestamp range, sampling frequency, geographical location, unit of measurement, etc. Such presented sequence data heterogeneity makes it difficult for traditional time series classification algorithms to perform well. Therefore, addressing the heterogeneity challenge demands learning not only the sub-patterns (local features) b… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for Publication and Presented in EAI MobiQuitous 2023 - 20th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

  35. arXiv:2403.19955  [pdf, ps, other

    cs.IT

    Joint Training and Reflection Pattern Optimization for Non-Ideal RIS-Aided Multiuser Systems

    Authors: Zhenyao He, **dan Xu, Hong Shen, Wei Xu, Chau Yuen, Marco Di Renzo

    Abstract: Reconfigurable intelligent surface (RIS) is a promising technique to improve the performance of future wireless communication systems at low energy consumption. To reap the potential benefits of RIS-aided beamforming, it is vital to enhance the accuracy of channel estimation. In this paper, we consider an RIS-aided multiuser system with non-ideal reflecting elements, each of which has a phase-depe… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  36. arXiv:2403.19876  [pdf, other

    cs.HC

    "I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices

    Authors: Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, Hong Shen

    Abstract: Large language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews an… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  37. arXiv:2403.19275  [pdf, other

    cs.CL cs.AI

    Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent

    Authors: Junkai Zhou, Liang Pang, Ya **g, Jia Gu, Huawei Shen, Xueqi Cheng

    Abstract: Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphis… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  38. arXiv:2403.19111  [pdf, other

    cs.CV

    Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection

    Authors: Hao Shen, Lu Shi, Wanru Xu, Yigang Cen, Linna Zhang, Gaoyun An

    Abstract: Video Anomaly Detection (VAD), aiming to identify abnormalities within a specific context and timeframe, is crucial for intelligent Video Surveillance Systems. While recent deep learning-based VAD models have shown promising results by generating high-resolution frames, they often lack competence in preserving detailed spatial and temporal coherence in video frames. To tackle this issue, we propos… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  39. arXiv:2403.18548  [pdf, other

    cs.CV

    A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

    Authors: Xiaofeng Cong, Jie Gui, **g Zhang, Junming Hou, Hao Shen

    Abstract: Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by CVPR2024

  40. Compressing and Interpreting Word Embeddings with Latent Space Regularization and Interactive Semantics Probing

    Authors: Haoyu Li, Junpeng Wang, Yan Zheng, Liang Wang, Wei Zhang, Han-Wei Shen

    Abstract: Word embedding, a high-dimensional (HD) numerical representation of words generated by machine learning models, has been used for different natural language processing tasks, e.g., translation between two languages. Recently, there has been an increasing trend of transforming the HD embeddings into a latent space (e.g., via autoencoders) for further tasks, exploiting various merits the latent repr… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Journal ref: Information Visualization (2023), 22(1), 52-68

  41. A Design Space for Intelligent and Interactive Writing Assistants

    Authors: Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, ** L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman , et al. (11 additional authors not shown)

    Abstract: In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at CHI 2024

  42. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  43. arXiv:2403.10252  [pdf, other

    cs.CV

    Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

    Authors: Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

    Abstract: In this study, we address the intricate challenge of multi-task dense prediction, encompassing tasks such as semantic segmentation, depth estimation, and surface normal estimation, particularly when dealing with partially annotated data (MTPSL). The complexity arises from the absence of complete task labels for each training image. Given the inter-related nature of these pixel-wise dense tasks, ou… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  44. arXiv:2403.08350  [pdf, other

    cs.CV

    CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

    Authors: Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, **gkuan Song

    Abstract: Instruction tuning represents a prevalent strategy employed by Multimodal Large Language Models (MLLMs) to align with human instructions and adapt to new tasks. Nevertheless, MLLMs encounter the challenge of adapting to users' evolving knowledge and demands. Therefore, how to retain existing skills while acquiring new knowledge needs to be investigated. In this paper, we present a comprehensive be… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  45. arXiv:2403.07815  [pdf, other

    cs.LG cs.AI

    Chronos: Learning the Language of Time Series

    Authors: Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang

    Abstract: We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M… ▽ More

    Submitted 2 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Code and model checkpoints available at https://github.com/amazon-science/chronos-forecasting

  46. arXiv:2403.04945  [pdf, other

    cs.CL cs.LG eess.SP

    MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

    Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Zhenwu Peng, Jie Fu, Rossella Arcucci, Huaxiu Yao, Mi Zhang

    Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  47. arXiv:2402.19401  [pdf, other

    cs.CV

    Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance

    Authors: Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki, Lina Marsso, Marsha Chechik

    Abstract: While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and co… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  48. arXiv:2402.18150  [pdf, other

    cs.CL cs.AI cs.IR

    Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation

    Authors: Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou

    Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating additional information from retrieval. However, studies have shown that LLMs still face challenges in effectively using the retrieved information, even ignoring it or being misled by it. The key reason is that the training of LLMs does not clearly make LLMs learn how to utilize input retrieved texts with va… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main

  49. arXiv:2402.17176  [pdf, other

    cs.LG

    DeepDRK: Deep Dependency Regularized Knockoff for Feature Selection

    Authors: Hongyu Shen, Yici Yan, Zhizhen Zhao

    Abstract: Model-X knockoff, among various feature selection methods, received much attention recently due to its guarantee on false discovery rate (FDR) control. Subsequent to its introduction in parametric design, knockoff is advanced to handle arbitrary data distributions using deep learning-based generative modeling. However, we observed that current implementations of the deep Model-X knockoff framework… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 23 pages, 14 figures, 7 tables

    MSC Class: 68T07 ACM Class: I.5.1

  50. arXiv:2402.15048  [pdf, other

    cs.CL cs.AI

    Unlocking the Power of Large Language Models for Entity Alignment

    Authors: Xuhui Jiang, Yinghan Shen, Zhichao Shi, Cheng** Xu, Wei Li, Zixuan Li, Jian Guo, Huawei Shen, Yuanzhuo Wang

    Abstract: Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative fra… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.