Skip to main content

Showing 1–50 of 1,701 results for author: Zhang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00141  [pdf, other

    cs.LG cs.AI

    Towards Secure and Efficient Data Scheduling for Vehicular Social Networks

    Authors: Youhua Xia, Tiehua Zhang, Jiong **, Ying He, Fei Yu

    Abstract: Efficient data transmission scheduling within vehicular environments poses a significant challenge due to the high mobility of such networks. Contemporary research predominantly centers on crafting cooperative scheduling algorithms tailored for vehicular networks. Notwithstanding, the intricacies of orchestrating scheduling in vehicular social networks both effectively and efficiently remain formi… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  2. arXiv:2406.19976  [pdf, other

    cs.LG math.OC

    ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

    Authors: Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang

    Abstract: Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.19791  [pdf, other

    cs.RO

    Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding

    Authors: Yifan Tang, Cong Tai, Fangxing Chen, Wanting Zhang, Tao Zhang, Xue** Liu, Yong** Liu, Long Zeng

    Abstract: Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed,… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This version has been accepted by ICRA2024 and the dataset has been published, where the link can be found in the paper

    Journal ref: IEEE International Conference on Robotics & Automation,2024

  4. arXiv:2406.19711  [pdf, other

    cs.LG

    CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

    Authors: Ziming Zhao, Tiehua Zhang, Zhishu Shen, Hai Dong, Xingjun Ma, Xianhui Liu, Yun Yang

    Abstract: In recent years, the widespread adoption of distributed microservice architectures within the industry has significantly increased the demand for enhanced system availability and robustness. Due to the complex service invocation paths and dependencies at enterprise-level microservice systems, it is challenging to locate the anomalies promptly during service invocations, thus causing intractable is… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.19708  [pdf, other

    cs.NE cs.AI cs.CE q-bio.NC

    A Differentiable Approach to Multi-scale Brain Modeling

    Authors: Chaoming Wang, Muyang Lyu, Tianqiu Zhang, Sichao He, Si Wu

    Abstract: We present a multi-scale differentiable brain modeling workflow utilizing BrainPy, a unique differentiable brain simulator that combines accurate brain simulation with powerful gradient-based optimization. We leverage this capability of BrainPy across different brain scales. At the single-neuron level, we implement differentiable neuron models and employ gradient methods to optimize their fit to e… ▽ More

    Submitted 1 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 2nd Differentiable Almost Everything Workshop at ICML 2024

  6. arXiv:2406.19389  [pdf, other

    cs.CV

    OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

    Authors: Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shun** Ji, Chen Change Loy, Shuicheng Yan

    Abstract: Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation and reasoning capabilities but lack pixel-level understanding and have difficulty accepting visual p… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.19369  [pdf, other

    cs.CV

    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

    Authors: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

    Abstract: Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifica… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 16 pages; 8 figures

  8. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin **, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.16850  [pdf, other

    cs.CV cs.RO

    From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

    Authors: Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang

    Abstract: Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Map** (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world wit… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 50 pages. arXiv admin note: substantial text overlap with arXiv:2402.08125

  11. arXiv:2406.16500  [pdf, other

    cs.NE

    A Dual-Channel Particle Swarm Optimization Algorithm Based on Adaptive Balance Search

    Authors: Zhenxing Zhang, Tianxian Zhang, Xiangliang Xu, Lingjiang Kong, Yi Han, Zicheng Wang

    Abstract: The balance between exploration (Er) and exploitation (Ei) determines the generalization performance of the particle swarm optimization (PSO) algorithm on different problems. Although the insufficient balance caused by global best being located near a local minimum has been widely researched, few scholars have systematically paid attention to two behaviors about personal best position (P) and glob… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.16422  [pdf, other

    cs.CV cs.AI

    Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

    Authors: Tiange Zhang, Qing Cai, Feng Gao, Lin Qi, Junyu Dong

    Abstract: Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. However, most existing methods pay more attention to learning domain-adaptive inductive bias (meta-knowledge) through feature-wise manipulation or task diversity improvement while neglecting the phenomenon that deep networks tend to rely more on high-frequency cues to make the classification decision,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16374  [pdf, other

    cs.CL

    KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

    Authors: Dongyang Li, Taolin Zhang, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Knowledge-enhanced pre-trained language models (KEPLMs) leverage relation triples from knowledge graphs (KGs) and integrate these external data sources into language models via self-supervised learning. Previous works treat knowledge enhancement as two independent operations, i.e., knowledge injection and knowledge integration. In this paper, we propose to learn Knowledge-Enhanced language represe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16372  [pdf, other

    cs.CL

    UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding

    Authors: Dongyang Li, Taolin Zhang, Jiali Deng, Longtao Huang, Chengyu Wang, Xiaofeng He, Hui Xue

    Abstract: Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages. However, previous works rely on shallow unsupervised data generated by token surface matching, regardless of the global context-aware semantics of the surrounding text tokens. In this paper, we propose an Unsupervised Pseu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16367  [pdf, other

    cs.IR

    On the Role of Long-tail Knowledge in Retrieval Augmented Large Language Models

    Authors: Dongyang Li, Junbing Yan, Taolin Zhang, Chengyu Wang, Xiaofeng He, Longtao Huang, Hui Xue, Jun Huang

    Abstract: Retrieval augmented generation (RAG) exhibits outstanding performance in promoting the knowledge capabilities of large language models (LLMs) with retrieved documents related to user queries. However, RAG only focuses on improving the response quality of LLMs via enhancing queries indiscriminately with retrieved information, paying little attention to what type of knowledge LLMs really need to ans… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.16012  [pdf

    eess.IV cs.CV

    Wound Tissue Segmentation in Diabetic Foot Ulcer Images Using Deep Learning: A Pilot Study

    Authors: Mrinal Kanti Dhar, Chuanbo Wang, Yash Patel, Taiyu Zhang, Jeffrey Niezgoda, Sandeep Gopalakrishnan, Keke Chen, Zeyun Yu

    Abstract: Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tis… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.15707  [pdf, other

    cs.CV

    psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery

    Authors: Tongtong Zhang, Yuanxiang Li

    Abstract: Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors wi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  18. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.15244  [pdf, other

    cs.LG math.OC

    Large Batch Analysis for Adagrad Under Anisotropic Smoothness

    Authors: Yuxing Liu, Rui Pan, Tong Zhang

    Abstract: Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can d… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  20. Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training

    Authors: Sunwoo Lee, Tuo Zhang, Saurav Prakash, Yue Niu, Salman Avestimehr

    Abstract: In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space. To implement large-scale FL applications, thus, it is crucial to develop a distributed learning method that enables the participation of such weak clients. We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Journal ref: IEEE Transactions on Mobile Computing, Early Access, (2024)

  21. arXiv:2406.15045  [pdf, other

    cs.CL

    Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction

    Authors: **ge Wu, Zhaolong Wu, Abul Hasan, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, dec… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.14901  [pdf, other

    cs.IR

    IDentity with Locality: An ideal hash for gene sequence search

    Authors: Aditya Desai, Gaurav Gupta, Tianyi Zhang, Anshumali Shrivastava

    Abstract: Gene sequence search is a fundamental operation in computational genomics. Due to the petabyte scale of genome archives, most gene search systems now use hashing-based data structures such as Bloom Filters (BF). The state-of-the-art systems such as Compact bit-slicing signature index (COBS) and Repeated And Merged Bloom filters (RAMBO) use BF with Random Hash (RH) functions for gene representation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 13 pages

  23. arXiv:2406.14662  [pdf, other

    cs.LG

    Advantage Alignment Algorithms

    Authors: Juan Agustin Duque, Milad Aghajohari, Tim Cooijmans, Tianyu Zhang, Aaron Courville

    Abstract: The growing presence of artificially intelligent agents in everyday decision-making, from LLM assistants to autonomous vehicles, hints at a future in which conflicts may arise from each agent optimizing individual interests. In general-sum games these conflicts are apparent, where naive Reinforcement Learning agents get stuck in Pareto-suboptimal Nash equilibria. Consequently, opponent sha** has… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 20 Pages, 6 figures

  24. arXiv:2406.14180  [pdf, other

    cs.NE

    RTFormer: Re-parameter TSBN Spiking Transformer

    Authors: Hongzhi Wang, Xiubo Liang, Mengjian Li, Tao Zhang

    Abstract: The Spiking Neural Networks (SNNs), renowned for their bio-inspired operational mechanism and energy efficiency, mirror the human brain's neural activity. Yet, SNNs face challenges in balancing energy efficiency with the computational demands of advanced tasks. Our research introduces the RTFormer, a novel architecture that embeds Re-parameterized Temporal Sliding Batch Normalization (TSBN) within… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  25. arXiv:2406.13925  [pdf, other

    cs.CL cs.AI

    GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models

    Authors: Tao Zhang, Ziqian Zeng, Yuxiang Xiao, Hui** Zhuang, Cen Chen, James Foulds, Shimei Pan

    Abstract: Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.12845  [pdf, other

    cs.LG cs.CL

    Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

    Authors: Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user request, with relative ratings indicating which response humans prefer. The trained RM… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Technical report v1. Code and model are released at https://github.com/RLHFlow/RLHF-Reward-Modeling/

  27. arXiv:2406.11230  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

    Authors: Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang

    Abstract: Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-contex… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  28. arXiv:2406.10991  [pdf, other

    cs.CL

    Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

    Authors: Tianhua Zhang, Kun Li, Hongyin Luo, Xixin Wu, James Glass, Helen Meng

    Abstract: Query rewriting is a crucial technique for passage retrieval in open-domain conversational question answering (CQA). It decontexualizes conversational queries into self-contained questions suitable for off-the-shelf retrievers. Existing methods attempt to incorporate retriever's preference during the training of rewriting models. However, these approaches typically rely on extensive annotations su… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  29. arXiv:2406.10960  [pdf, other

    cs.CL

    ESCoT: Towards Interpretable Emotional Support Dialogue Systems

    Authors: Tenggan Zhang, Xinjie Zhang, **ming Zhao, Li Zhou, Qin **

    Abstract: Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support respo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Long Paper)

  30. arXiv:2406.10857  [pdf, other

    cs.SE

    An LLM-enhanced Multi-objective Evolutionary Search for Autonomous Driving Test Scenario Generation

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, Yuan Zhou, Shuo Li, Jun Wei, Dan Ye, Wei Wang, Tianwei Zhang

    Abstract: The safety of Autonomous Driving Systems (ADSs) is significantly important for the implementation of autonomous vehicles (AVs). Therefore, ADSs must be evaluated thoroughly before their release and deployment to the public. How to generate diverse safety-critical test scenarios is a key task for ADS testing. This paper proposes LEADE, an LLM-enhanced scenario generation approach for ADS testing, w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages

  31. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  32. arXiv:2406.10615  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation

    Authors: Tong Zhang, Yingdong Hu, Jiacheng You, Yang Gao

    Abstract: Given the high cost of collecting robotic data in the real world, sample efficiency is a consistently compelling pursuit in robotics. In this paper, we introduce SGRv2, an imitation learning framework that enhances sample efficiency through improved visual and action representations. Central to the design of SGRv2 is the incorporation of a critical inductive bias-action locality, which posits that… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Project website: http://sgrv2-robot.github.io

  33. arXiv:2406.10318  [pdf, other

    cs.CV cs.AI

    Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding

    Authors: Tuo Zhang, Tiantian Feng, Yibin Ni, Mengqin Cao, Ruying Liu, Katharine Butler, Yanjun Weng, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

    Abstract: Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  35. arXiv:2406.10216  [pdf, other

    cs.CL cs.AI

    Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

    Authors: Rui Yang, Ruomeng Ding, Yong Lin, Huan Zhang, Tong Zhang

    Abstract: Reward models trained on human preference data have been proven to be effective for aligning Large Language Models (LLMs) with human intent within the reinforcement learning from human feedback (RLHF) framework. However, the generalization capabilities of current reward models to unseen prompts and responses are limited. This limitation can lead to an unexpected phenomenon known as reward over-opt… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 21 pages

  36. arXiv:2406.10212  [pdf, other

    cs.CV cs.GR

    NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity

    Authors: Akshat Dave, Tianyi Zhang, Aaron Young, Ramesh Raskar, Wolfgang Heidrich, Ashok Veeraraghavan

    Abstract: Photoelasticity enables full-field stress analysis in transparent objects through stress-induced birefringence. Existing techniques are limited to 2D slices and require destructively slicing the object. Recovering the internal 3D stress distribution of the entire object is challenging as it involves solving a tensor tomography problem and handling phase wrap** ambiguities. We introduce NeST, an… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://akshatdave.github.io/nest

  37. arXiv:2406.09103  [pdf, other

    cs.CL

    Chain-of-Though (CoT) prompting strategies for medical error detection and correction

    Authors: Zhaolong Wu, Abul Hasan, **ge Wu, Yunsoo Kim, Jason P. Y. Cheung, Teng Zhang, Honghan Wu

    Abstract: This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: accepted as NAACL workshop

  38. arXiv:2406.08959  [pdf, other

    cs.HC cs.AI

    Beyond Recommendations: From Backward to Forward AI Support of Pilots' Decision-Making Process

    Authors: Zelun Tony Zhang, Sebastian S. Feger, Lucas Dullenkopf, Rulu Liao, Lukas Süsslin, Yuanting Liu, Andreas Butz

    Abstract: AI is anticipated to enhance human decision-making in high-stakes domains like aviation, but adoption is often hindered by challenges such as inappropriate reliance and poor alignment with users' decision-making. Recent research suggests that a core underlying issue is the recommendation-centric design of many AI systems, i.e., they give end-to-end recommendations and ignore the rest of the decisi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to CSCW 2024, to be published in PACM HCI Vol. 8, No. CSCW2

  39. arXiv:2406.08845  [pdf, other

    cs.CV

    Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

    Authors: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, ** Luo, Kaipeng Zhang

    Abstract: Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. H… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  40. arXiv:2406.08759  [pdf, other

    cs.CV cs.MM

    Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling

    Authors: Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo

    Abstract: The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage chal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.08731  [pdf, other

    cs.SE

    Where Do Large Language Models Fail When Generating Code?

    Authors: Zhijie Wang, Zijie Zhou, Da Song, Yuheng Huang, Shengmai Chen, Lei Ma, Tianyi Zhang

    Abstract: Large Language Models (LLMs) have shown great potential in code generation. However, current LLMs still cannot reliably generate correct code. Moreover, it is unclear what kinds of code generation errors LLMs can make. To address this, we conducted an empirical study to analyze incorrect code snippets generated by six popular LLMs on the HumanEval dataset. We analyzed these errors alongside two di… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Extended from our MAPS 2023 paper. Our data is available at https://llm-code-errors.cs.purdue.edu

  42. arXiv:2406.08298  [pdf, other

    cs.CV

    AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer

    Authors: Yitao Xu, Tong Zhang, Sabine Süsstrunk

    Abstract: Vision Transformers (ViTs) have demonstrated remarkable performance in image classification tasks, particularly when equipped with local information via region attention or convolutions. While such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 26 pages, 11 figures

  43. arXiv:2406.07529  [pdf, other

    cs.LG

    MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

    Authors: Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

    Abstract: Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the ob… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  44. arXiv:2406.07502  [pdf, other

    cs.CV cs.CL

    Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    Authors: Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang

    Abstract: Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scra** of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy. Another is t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  45. arXiv:2406.06462  [pdf, other

    cs.CV cs.LG

    VCR: Visual Caption Restoration

    Authors: Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar, Jie Fu, Bang Liu, Yoshua Bengio

    Abstract: We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedde… ▽ More

    Submitted 24 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 17 pages, 2 figures

  46. arXiv:2406.05931  [pdf, other

    cs.RO

    Differentiable Discrete Elastic Rods for Real-Time Modeling of Deformable Linear Objects

    Authors: Yizhou Chen, Yiting Zhang, Zachary Brei, Tiancheng Zhang, Yuzhen Chen, Julie Wu, Ram Vasudevan

    Abstract: This paper addresses the task of modeling Deformable Linear Objects (DLOs), such as ropes and cables, during dynamic motion over long time horizons. This task presents significant challenges due to the complex dynamics of DLOs. To address these challenges, this paper proposes differentiable Discrete Elastic Rods For deformable linear Objects with Real-time Modeling (DEFORM), a novel framework that… ▽ More

    Submitted 14 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  47. arXiv:2406.04573  [pdf

    cs.CV

    Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection

    Authors: Yiheng Zhang, Yunkang Cao, Tianhang Zhang, Weiming Shen

    Abstract: This study targets Multi-Lighting Image Anomaly Detection (MLIAD), where multiple lighting conditions are utilized to enhance imaging quality and anomaly detection performance. While numerous image anomaly detection methods have been proposed, they lack the capacity to handle multiple inputs for a single sample, like multi-lighting images in MLIAD. Hence, this study proposes Attention Fusion Rever… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  48. arXiv:2406.04558  [pdf, other

    cs.LG math.OC

    On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

    Authors: Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, Jose Gallego-Posada

    Abstract: Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Published at ICML 2024. Code available at https://github.com/motahareh-sohrabi/nuPI

  49. arXiv:2406.04478  [pdf, other

    cs.CL cs.LG

    PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning

    Authors: Tianrong Zhang, Zhaohan Xi, Ting Wang, Prasenjit Mitra, **ghui Chen

    Abstract: Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances. Meanwhile, the soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting as the most popular training paradigms for natural language processing (NLP) models. Nevertheless, existing studies have s… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: NAACL 2024

  50. arXiv:2406.04427  [pdf, other

    cs.SE

    reAnalyst: Scalable Analysis of Reverse Engineering Activities

    Authors: Tab Zhang, Claire Taylor, Bart Coppens, Waleed Mebane, Christian Collberg, Bjorn De Sutter

    Abstract: This paper introduces reAnalyst, a scalable analysis framework designed to facilitate the study of reverse engineering (RE) practices through the semi-automated annotation of RE activities across various RE tools. By integrating tool-agnostic data collection of screenshots, keystrokes, active processes, and other types of data during RE experiments with semi-automated data analysis and annotation,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Submitted to Computers & Security