Skip to main content

Showing 1–50 of 1,193 results for author: He, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01330  [pdf, other

    cs.CV

    Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

    Authors: Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He

    Abstract: Unsigned distance fields (UDFs) provide a versatile framework for representing a diverse array of 3D shapes, encompassing both watertight and non-watertight geometries. Traditional UDF learning methods typically require extensive training on large datasets of 3D shapes, which is costly and often necessitates hyperparameter adjustments for new datasets. This paper presents a novel neural framework,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 14 pages, 11 figures

    ACM Class: I.3.5

  2. arXiv:2407.01155  [pdf, other

    cs.LG

    CPT: Consistent Proxy Tuning for Black-box Optimization

    Authors: Yuanyang He, Zitong Huang, Xinxing Xu, Rick Siow Mong Goh, Salman Khan, Wangmeng Zuo, Yong Liu, Chun-Mei Feng

    Abstract: Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serv… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages,2 figures plus supplementary materials

  3. arXiv:2407.00141  [pdf, other

    cs.LG cs.AI

    Towards Secure and Efficient Data Scheduling for Vehicular Social Networks

    Authors: Youhua Xia, Tiehua Zhang, Jiong **, Ying He, Fei Yu

    Abstract: Efficient data transmission scheduling within vehicular environments poses a significant challenge due to the high mobility of such networks. Contemporary research predominantly centers on crafting cooperative scheduling algorithms tailored for vehicular networks. Notwithstanding, the intricacies of orchestrating scheduling in vehicular social networks both effectively and efficiently remain formi… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  4. arXiv:2406.19949  [pdf, other

    cs.CL

    Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

    Authors: Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

    Abstract: Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.19277  [pdf, other

    cs.SI

    The Emergence of Threads: The Birth of a New Social Network

    Authors: Peixian Zhang, Yupeng He, Ehsan-Ul Haq, Jiahui He, Gareth Tyson

    Abstract: Threads, a new microblogging platform from Meta, was launched in July 2023. In contrast to prior new platforms, Threads was borne out of an existing parent platform, Instagram, for which all users must already possess an account. This offers a unique opportunity to study platform evolution, to understand how one existing platform can support the "birth" of another. With this in mind, this paper pr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.18449  [pdf, other

    cs.CL cs.AI

    Cascading Large Language Models for Salient Event Graph Generation

    Authors: Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

    Abstract: Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents C… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 9 + 12 pages

  7. arXiv:2406.18245  [pdf, other

    cs.CL

    Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

    Authors: Italo Luis da Silva, Hanqi Yan, Lin Gui, Yulan He

    Abstract: The inherent ambiguity of cause and effect boundaries poses a challenge in evaluating causal event extraction tasks. Traditional metrics like Exact Match and BertScore poorly reflect model performance, so we trained evaluation models to approximate human evaluation, achieving high agreement. We used them to perform Reinforcement Learning with extraction models to align them with human preference,… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures, 6 tables

  8. arXiv:2406.18199  [pdf, other

    cs.CV

    GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting

    Authors: Jiaze Li, Zhengyu Wen, Luo Zhang, Jiangbei Hu, Fei Hou, Zhebin Zhang, Ying He

    Abstract: The 3D Gaussian Splatting technique has significantly advanced the construction of radiance fields from multi-view images, enabling real-time rendering. While point-based rasterization effectively reduces computational demands for rendering, it often struggles to accurately reconstruct the geometry of the target object, especially under strong lighting. To address this challenge, we introduce a no… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.18094  [pdf, other

    cs.CL

    Shimo Lab at "Discharge Me!": Discharge Summarization by Prompt-Driven Concatenation of Electronic Health Record Sections

    Authors: Yunzhen He, Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: In this paper, we present our approach to the shared task "Discharge Me!" at the BioNLP Workshop 2024. The primary goal of this task is to reduce the time and effort clinicians spend on writing detailed notes in the electronic health record (EHR). Participants develop a pipeline to generate the "Brief Hospital Course" and "Discharge Instructions" sections from the EHR. Our approach involves a firs… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: BioNLP @ ACL2024

  10. arXiv:2406.18070  [pdf, other

    cs.CV

    EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

    Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yu** He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

    Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More

    Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Champion solutions in the EgoVis CVPR 2024 workshop

  11. arXiv:2406.18054  [pdf, other

    eess.IV cs.CV

    Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation

    Authors: Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He

    Abstract: The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  12. arXiv:2406.17969  [pdf, other

    cs.CL cs.AI

    Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

    Authors: Hanqi Yan, Yanzheng Xiang, Guangyi Chen, Yifei Wang, Lin Gui, Yulan He

    Abstract: To better interpret the intrinsic mechanism of large language models (LLMs), recent studies focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts. Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to m… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.17680  [pdf, other

    cs.CV

    End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

    Authors: Mingzhe Guo, Zhipeng Zhang, Yuan He, Ke Wang, Li** **g

    Abstract: We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD), achieving the best open-loop evaluation performance in nuScenes, meanwhile showing robust closed-loop driving quality in CARLA. Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks, with carefully designed supervised perception and predictio… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages, 10 figures and 15 tables

  14. arXiv:2406.17319  [pdf, other

    cs.CV

    DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

    Authors: Aihua Mao, Yuxuan Tang, Jiangtao Huang, Ying He

    Abstract: In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  15. arXiv:2406.16863  [pdf, other

    cs.CV

    FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

    Authors: Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu

    Abstract: Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g., conditional adapter), we argue that diffusion model itself allows decent control over the generated content without requiring any training. In this study, we introd… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project Page: http://haonanqiu.com/projects/FreeTraj.html, Code Repo: https://github.com/arthur-qiu/FreeTraj

  16. arXiv:2406.16360  [pdf, other

    cs.CV cs.GR

    MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

    Authors: Yuxin Dai, Qi Wang, **gsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He

    Abstract: We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures

  17. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  18. arXiv:2406.14927  [pdf, other

    cs.CV cs.RO

    Gaussian-Informed Continuum for Physical Property Identification and Simulation

    Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

    Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  19. arXiv:2406.14863  [pdf, other

    cs.CR cs.AR

    Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

    Authors: Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang

    Abstract: Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference 2024

  20. arXiv:2406.14550  [pdf, other

    cs.CL cs.AI

    GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

    Authors: Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first four authors contributed equally, 27 pages

  21. arXiv:2406.14318  [pdf, other

    cs.CR cs.AI cs.CL

    The Fire Thief Is Also the Keeper: Balancing Usability and Privacy in Prompts

    Authors: Zhili Shen, Zihang Xi, Ying He, Wei Tong, **gyu Hua, Sheng Zhong

    Abstract: The rapid adoption of online chatbots represents a significant advancement in artificial intelligence. However, this convenience brings considerable privacy concerns, as prompts can inadvertently contain sensitive information exposed to large language models (LLMs). Limited by high computational costs, reduced task usability, and excessive system modifications, previous works based on local deploy… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  22. Causal Inference with Latent Variables: Recent Advances and Future Prospectives

    Authors: Yaochen Zhu, Yinhan He, **g Ma, Mengxuan Hu, Sheng Li, Jundong Li

    Abstract: Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from t… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD'24 Survey Track

  23. arXiv:2406.13768  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    FastPersist: Accelerating Model Checkpointing in Deep Learning

    Authors: Guanhua Wang, Olatunji Ruwase, Bing Xie, Yuxiong He

    Abstract: Model checkpoints are critical Deep Learning (DL) artifacts that enable fault tolerance for training and downstream applications, such as inference. However, writing checkpoints to persistent storage, and other I/O aspects of DL training, are mostly ignored by compute-focused optimization efforts for faster training of rapidly growing models and datasets. Towards addressing this imbalance, we prop… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 11 pages

  24. arXiv:2406.13618  [pdf, other

    cs.CL

    In-Context Former: Lightning-fast Compressing Context for Large Language Model

    Authors: Xiangfeng Wang, Zaiyi Chen, Zheyong Xie, Tong Xu, Yongyi He, Enhong Chen

    Abstract: With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods typically leverage the self-attention mechanism of the LLM itself for context compression. While these methods have achieved notable results, the compression process… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  25. arXiv:2406.13223  [pdf, other

    cs.RO

    Act Better by Timing: A timing-Aware Reinforcement Learning for Autonomous Driving

    Authors: Guanzhou Li, Jian** Wu, Yu**g He

    Abstract: Co** with intensively interactive scenarios is one of the significant challenges in the development of autonomous driving. Reinforcement learning (RL) offers an ideal solution for such scenarios through its self-evolution mechanism via interaction with the environment. However, the lack of sufficient safety mechanisms in common RL leads to the fact that agent often find it difficult to interact… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.12947  [pdf, other

    cs.CR cs.SE

    AutoFirm: Automatically Identifying Reused Libraries inside IoT Firmware at Large-Scale

    Authors: YongLe Chen, Feng Ma, Ying Zhang, YongZhong He, Haining Wang, Qiang Li

    Abstract: The Internet of Things (IoT) has become indispensable to our daily lives and work. Unfortunately, developers often reuse software libraries in the IoT firmware, leading to a major security concern. If vulnerabilities or insecure versions of these libraries go unpatched, a massive number of IoT devices can be impacted. In this paper, we propose the AutoFirm, an automated tool for detecting reused l… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 20 figures

  27. arXiv:2406.12839  [pdf, other

    cs.LG math.DS math.OC math.PR stat.ML

    Evaluating the design space of diffusion-based generative models

    Authors: Yuqing Wang, Ye He, Molei Tao

    Abstract: Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Comments are welcome

  28. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.12119  [pdf

    cs.LG cs.AI cs.SI

    Deploying scalable traffic prediction models for efficient management in real-world large transportation networks during hurricane evacuations

    Authors: Qinhua Jiang, Brian Yueshuai He, Changju Lee, Jiaqi Ma

    Abstract: Accurate traffic prediction is vital for effective traffic management during hurricane evacuation. This paper proposes a predictive modeling system that integrates Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) models to capture both long-term congestion patterns and short-term speed patterns. Leveraging various input variables, including archived traffic data, spatial-temporal road… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE ITS Magazine and currently under review

  30. Multi-Layer Ranking with Large Language Models for News Source Recommendation

    Authors: Wenjia Zhang, Lin Gui, Rob Procter, Yulan He

    Abstract: To seek reliable information sources for news events, we introduce a novel task of expert recommendation, which aims to identify trustworthy sources based on their previously quoted statements. To achieve this, we built a novel dataset, called NewsQuote, consisting of 23,571 quote-speaker pairs sourced from a collection of news articles. We formulate the recommendation task as the retrieval of exp… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by the SIGIR 2024. arXiv admin note: text overlap with arXiv:2305.04825

  31. arXiv:2406.11006  [pdf, other

    cs.SD cs.AI eess.AS

    SPEAR: Receiver-to-Receiver Acoustic Neural War** Field

    Authors: Yuhang He, Shitong Xu, Jia-Xing Zhong, Sangyun Shin, Niki Trigoni, Andrew Markham

    Abstract: We present SPEAR, a continuous receiver-to-receiver acoustic neural war** field for spatial acoustic effects prediction in an acoustic 3D space with a single stationary audio source. Unlike traditional source-to-receiver modelling methods that require prior space acoustic properties knowledge to rigorously model audio propagation from source to receiver, we propose to predict by war** the spat… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures in main paper

  32. arXiv:2406.10964  [pdf, other

    cs.AI

    Ontology Embedding: A Survey of Methods, Applications and Resources

    Authors: Jiaoyan Chen, Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf, Yuan He, Ian Horrocks

    Abstract: Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine l… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: A comprehensive survey on ontology embedding

  33. arXiv:2406.10957  [pdf, other

    cs.CL

    Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

    Authors: Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

    Abstract: Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observ… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  34. arXiv:2406.10932  [pdf, other

    cs.SD cs.AI eess.AS

    Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

    Authors: Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Wei** Wen

    Abstract: Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  35. arXiv:2406.10085  [pdf, other

    cs.CL

    Enhancing Question Answering on Charts Through Effective Pre-training Tasks

    Authors: Ashim Gupta, Vivek Gupta, Shuo Zhang, Yujie He, Ning Zhang, Shalin Shah

    Abstract: To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the li… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  36. arXiv:2406.09357  [pdf, other

    cs.LG stat.ML

    Advancing Graph Generation through Beta Diffusion

    Authors: Yilin He, Xinyang Liu, Bo Chen, Mingyuan Zhou

    Abstract: Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  37. arXiv:2406.09041  [pdf, other

    cs.CL cs.AI cs.LG

    ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

    Authors: **g Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

    Abstract: The typical process for develo** LLMs involves pre-training a general foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts. Serving these experts poses challenges, as loading all experts onto devices is impractical, and frequent switching between experts in response to user requests incurs substantial I/O costs, increasing latency and expe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Tech report

  38. arXiv:2406.08689  [pdf, other

    cs.CR cs.AI

    Security of AI Agents

    Authors: Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen

    Abstract: The study and development of AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments, Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential v… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  39. arXiv:2406.08665  [pdf, other

    cs.SE cs.AI

    Exploring Fuzzing as Data Augmentation for Neural Test Generation

    Authors: Yifeng He, Jicheng Wang, Yuyang Rong, Hao Chen

    Abstract: Testing is an essential part of modern software engineering to build reliable programs. As testing the software is important but expensive, automatic test case generation methods have become popular in software development. Unlike traditional search-based coverage-guided test generation like fuzzing, neural test generation backed by large language models can write tests that are semantically meani… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang **, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.08343  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

    Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  42. arXiv:2406.07714  [pdf, other

    cs.CR cs.AI cs.SE

    LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

    Authors: Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen

    Abstract: Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to e… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  43. arXiv:2406.05852  [pdf, other

    cs.CV cs.GR

    RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering

    Authors: Rui Zhang, Tianyue Luo, Weidong Yang, Ben Fei, **gyi Xu, Qingyuan Zhou, Keyi Liu, Ying He

    Abstract: 3D Gaussian Splatting (3D-GS) has made a notable advancement in the field of neural rendering, 3D scene reconstruction, and novel view synthesis. Nevertheless, 3D-GS encounters the main challenge when it comes to accurately representing physical reflections, especially in the case of total reflection and semi-reflection that are commonly found in real-world scenes. This limitation causes reflectio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  44. arXiv:2406.05285  [pdf, other

    cs.CV

    VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

    Abstract: Segmentation foundation models have attracted great interest, however, none of them are adequate enough for the use cases in 3D computed tomography scans (CT) images. Existing works finetune on medical images with 2D foundation models trained on natural images, but interactive segmentation, especially in 2D, is too time-consuming for 3D scans and less useful for large cohort analysis. Models that… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  45. arXiv:2406.04609  [pdf, other

    cs.LG cs.AI

    Diverse Intra- and Inter-Domain Activity Style Fusion for Cross-Person Generalization in Activity Recognition

    Authors: Junru Zhang, Lang Feng, Zhidan Liu, Yuhan Wu, Yang He, Yabo Dong, Duanqing Xu

    Abstract: Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and in… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)

  46. arXiv:2406.01900  [pdf, other

    cs.CV

    Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

    Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://follow-your-emoji.github.io/

  47. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  48. arXiv:2406.00346  [pdf, other

    cs.CV

    Details Enhancement in Unsigned Distance Field Learning for High-fidelity 3D Surface Reconstruction

    Authors: Cheng Xu, Fei Hou, Wencheng Wang, Hong Qin, Zhebin Zhang, Ying He

    Abstract: While Signed Distance Fields (SDF) are well-established for modeling watertight surfaces, Unsigned Distance Fields (UDF) broaden the scope to include open surfaces and models with complex inner structures. Despite their flexibility, UDFs encounter significant challenges in high-fidelity 3D reconstruction, such as non-differentiability at the zero level set, difficulty in achieving the exact zero v… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  49. arXiv:2406.00034  [pdf, other

    cs.CL cs.AI

    Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

    Authors: Tianlong Wang, Xianfeng Jiao, Yifan He, Zhongzhi Chen, Yinghao Zhu, Xu Chu, Junyi Gao, Yasha Wang, Liantao Ma

    Abstract: Recent studies have indicated that Large Language Models (LLMs) harbor an inherent understanding of truthfulness, yet often fail to express fully and generate false statements. This gap between "knowing" and "telling" poses a challenge for ensuring the truthfulness of generated content. To address this, we introduce Adaptive Activation Steering (ACT), a tuning-free method that adaptively shift LLM… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.17811

  50. arXiv:2405.20791  [pdf, other

    cs.CV cs.LG

    GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis

    Authors: Yumeng He, Yunbo Wang, Xiaokang Yang

    Abstract: Decoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting eff… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.