Skip to main content

Showing 1–50 of 332 results for author: Wei, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00994  [pdf, other

    cs.CL

    LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation

    Authors: Longchao Da, Tie** Chen, Lu Cheng, Hua Wei

    Abstract: The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should w… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages main content, 5 pages appendix

    ACM Class: I.2.7

  2. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2406.18848  [pdf, other

    cs.LG

    Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation

    Authors: Hui Wei, Maxwell A. Xu, Colin Samplawski, James M. Rehg, Santosh Kumar, Benjamin M. Marlin

    Abstract: Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a n… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by Conference on Health, Inference, and Learning (CHIL) 2024

  4. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  5. arXiv:2406.12147  [pdf, other

    cs.AI

    Metacognitive AI: Framework and the Case for a Neurosymbolic Approach

    Authors: Hua Wei, Paulo Shakarian, Christian Lebiere, Bruce Draper, Nikhil Krishnaswamy, Sergei Nirenburg

    Abstract: Metacognition is the concept of reasoning about an agent's own internal processes and was originally introduced in the field of developmental psychology. In this position paper, we examine the concept of applying metacognition to artificial intelligence. We introduce a framework for understanding metacognitive artificial intelligence (AI) that we call TRAP: transparency, reasoning, adaptation, and… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.10671  [pdf

    cs.CL

    Augmenting Biomedical Named Entity Recognition with General-domain Resources

    Authors: Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

    Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

  7. arXiv:2406.09262  [pdf, other

    cs.LG

    Flexible Heteroscedastic Count Regression with Deep Double Poisson Networks

    Authors: Spencer Young, Porter Jenkins, Lonchao Da, Jeff Dotson, Hua Wei

    Abstract: Neural networks that can produce accurate, input-conditional uncertainty representations are critical for real-world applications. Recent progress on heteroscedastic continuous regression has shown great promise for calibrated uncertainty quantification on complex tasks, like image regression. However, when these methods are applied to discrete regression tasks, such as crowd counting, ratings pre… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.07455  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

    Authors: Qining Zhang, Honghao Wei, Lei Ying

    Abstract: In this paper, we study reinforcement learning from human feedback (RLHF) under an episodic Markov decision process with a general trajectory-wise reward model. We developed a model-free RLHF best policy identification algorithm, called $\mathsf{BSAD}$, without explicit reward model inference, which is a critical intermediate step in the contemporary RLHF paradigms for training large language mode… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.03702  [pdf, other

    cs.CV

    DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation

    Authors: Zilu Guo, Liuyang Bian, Xuan Huang, Hu Wei, **gyu Li, Huasheng Ni

    Abstract: Atrous convolutions are employed as a method to increase the receptive field in semantic segmentation tasks. However, in previous works of semantic segmentation, it was rarely employed in the shallow layers of the model. We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs), and demonstrate that the concept of using large kernels to apply atrous convolutions c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  10. arXiv:2406.03511  [pdf, other

    cs.LG cs.AI

    MagiNet: Mask-Aware Graph Imputation Network for Incomplete Traffic Data

    Authors: Jian** Zhou, Bin Lu, Zhanyu Liu, Siyu Pan, Xuejun Feng, Hua Wei, Guanjie Zheng, Xinbing Wang, Chenghu Zhou

    Abstract: Due to detector malfunctions and communication failures, missing data is ubiquitous during the collection of traffic data. Therefore, it is of vital importance to impute the missing values to facilitate data analysis and decision-making for Intelligent Transportation System (ITS). However, existing imputation methods generally perform zero pre-filling techniques to initialize missing values, intro… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 19 pages, 7 figures

  11. arXiv:2405.20787  [pdf, other

    cs.CL

    PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

    Authors: Yang Zhou, Shimin Shan, Hongkui Wei, Zhehuan Zhao, Wenshuo Feng

    Abstract: Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-sampl… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  12. arXiv:2405.19609  [pdf, other

    cs.CV cs.GR

    SMPLX-Lite: A Realistic and Drivable Avatar Benchmark with Rich Geometry and Texture Annotations

    Authors: Yujiao Jiang, Qingmin Liao, Zhaolong Wang, Xiangru Lin, Zongqing Lu, Yuxi Zhao, Hanqing Wei, **grui Ye, Yu Zhang, Zhi**g Shao

    Abstract: Recovering photorealistic and drivable full-body avatars is crucial for numerous applications, including virtual reality, 3D games, and tele-presence. Most methods, whether reconstruction or generation, require large numbers of human motion sequences and corresponding textured meshes. To easily learn a drivable avatar, a reasonable parametric body model with unified topology is paramount. However,… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICME 2024;Project page: https://alex-jyj.github.io/SMPLX-Lite/

  13. arXiv:2405.17264  [pdf, other

    cs.CL cs.LG

    On the Noise Robustness of In-Context Learning for Text Generation

    Authors: Hongfu Gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

    Abstract: Large language models (LLMs) have shown impressive performance on downstream tasks by in-context learning (ICL), which heavily relies on the quality of demonstrations selected from a large set of annotated examples. Recent works claim that in-context learning is robust to noisy demonstrations in text classification. In this work, we show that, on text generation tasks, noisy annotations significan… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  14. arXiv:2405.17152  [pdf, other

    cs.MA cs.AI

    CoSLight: Co-optimizing Collaborator Selection and Decision-making to Enhance Traffic Signal Control

    Authors: **gqing Ruan, Ziyue Li, Hua Wei, Haoyuan Jiang, Jiaming Lu, Xuantang Xiong, Hangyu Mao, Rui Zhao

    Abstract: Effective multi-intersection collaboration is pivotal for reinforcement-learning-based traffic signal control to alleviate congestion. Existing work mainly chooses neighboring intersections as collaborators. However, quite an amount of congestion, even some wide-range congestion, is caused by non-neighbors failing to collaborate. To address these issues, we propose to separate the collaborator sel… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  15. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, **yue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.10357  [pdf, other

    cs.CV

    RGB Guided ToF Imaging System: A Survey of Deep Learning-based Methods

    Authors: Xin Qiao, Matteo Poggi, Pengchao Deng, Hao Wei, Chenyang Ge, Stefano Mattoccia

    Abstract: Integrating an RGB camera into a ToF imaging system has become a significant technique for perceiving the real world. The RGB guided ToF imaging system is crucial to several applications, including face anti-spoofing, saliency detection, and trajectory prediction. Depending on the distance of the working range, the implementation schemes of the RGB guided ToF imaging systems are different. Specifi… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: To appear on International Journal of Computer Vision (IJCV)

  17. arXiv:2405.07524  [pdf, other

    cs.CV

    HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval

    Authors: Chao He, Hongxi Wei

    Abstract: Deep image hashing aims to map input images into simple binary hash codes via deep neural networks and thus enable effective large-scale image retrieval. Recently, hybrid networks that combine convolution and Transformer have achieved superior performance on various computer tasks and have attracted extensive attention from researchers. Nevertheless, the potential benefits of such hybrid networks… ▽ More

    Submitted 14 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by ICMR 2024

  18. arXiv:2405.07047  [pdf, other

    cs.CV

    Unsupervised Density Neural Representation for CT Metal Artifact Reduction

    Authors: Qing Wu, Xu Guo, Lixuan Chen, Dongming He, Hongjiang Wei, Xudong Wang, S. Kevin Zhou, Yifeng Zhang, **gyi Yu, Yuyao Zhang

    Abstract: Emerging unsupervised reconstruction techniques based on implicit neural representation (INR), such as NeRP, CoIL, and SCOPE, have shown unique capabilities in CT linear inverse imaging. In this work, we propose a novel unsupervised density neural representation (Diner) to tackle the challenging problem of CT metal artifacts when scanned objects contain metals. The drastic variation of linear atte… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 13 pages

  19. arXiv:2405.05594  [pdf, other

    cs.AI

    Expected Work Search: Combining Win Rate and Proof Size Estimation

    Authors: Owen Randall, Martin Müller, Ting Han Wei, Ryan Hayward

    Abstract: We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  20. arXiv:2405.00393  [pdf, other

    cs.CR

    Inferring State Machine from the Protocol Implementation via Large Language Model

    Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

    Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More

    Submitted 14 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  21. arXiv:2404.18820  [pdf, other

    eess.IV cs.CV

    Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

    Authors: Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, **gwen Jiang

    Abstract: Image compression at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. In this work, we propose a novel two-stage extreme image compression framework that exploits the powerful generative capability of pre-trained diffusion models to achieve realistic image reconstruction at extremely low bitrates. In the first stage, we treat t… ▽ More

    Submitted 13 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TCSVT

  22. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, **gyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  23. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  24. arXiv:2404.13842  [pdf, other

    cs.CV cs.CG

    On Support Relations Inference and Scene Hierarchy Graph Construction from Point Cloud in Clustered Environments

    Authors: Gang Ma, Hui Wei

    Abstract: Over the years, scene understanding has attracted a growing interest in computer vision, providing the semantic and physical scene information necessary for robots to complete some particular tasks autonomously. In 3D scenes, rich spatial geometric and topological information are often ignored by RGB-based approaches for scene understanding. In this study, we develop a bottom-up approach for scene… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  25. arXiv:2404.13600  [pdf, other

    cs.RO

    Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments

    Authors: Zirui Wang, Chen Yao, Yangtao Ge, Guowei Shi, Ningbo Yang, Zheng Zhu, Kewei Dong, Hexiang Wei, Zhenzhong Jia, **g Wu

    Abstract: So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

  26. arXiv:2404.12675  [pdf, other

    cs.CR

    ESPM-D: Efficient Sparse Polynomial Multiplication for Dilithium on ARM Cortex-M4 and Apple M2

    Authors: Jieyu Zheng, Hong Zhang, Le Tian, Zhuo Zhang, Hanyu Wei, Zhiwei Chu, Yafang Yang, Yunlei Zhao

    Abstract: Dilithium is a lattice-based digital signature scheme standardized by the NIST post-quantum cryptography (PQC) project. In this study, we focus on develo** efficient sparse polynomial multiplication implementations of Dilithium for ARM Cortex-M4 and Apple M2, which are both based on the ARM architecture. The ARM Cortex-M4 is commonly utilized in resource-constrained devices such as sensors. Conv… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 19 pages, 1 figure

  27. arXiv:2404.12090  [pdf, other

    cs.AI

    X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner

    Authors: Haoyuan Jiang, Ziyue Li, Hua Wei, Xuantang Xiong, **gqing Ruan, Jiaming Lu, Hangyu Mao, Rui Zhao

    Abstract: The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  28. arXiv:2404.10540  [pdf, other

    cs.CV cs.LG

    SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

    Authors: Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang

    Abstract: Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-b… ▽ More

    Submitted 19 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  29. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: **yue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables

  30. arXiv:2404.08563  [pdf, other

    cs.RO

    FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments

    Authors: Hexiang Wei, Jianhao Jiao, Xiangcheng Hu, **gwen Yu, Xupeng Xie, ** Wu, Yilong Zhu, Yuxuan Liu, Lujia Wang, Ming Liu

    Abstract: Simultaneous Localization and Map** (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM data… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 20 pages, 17 figures, 7 tables. Submitted for IJRR dataset paper

  31. arXiv:2404.07821  [pdf, other

    cs.CV

    Sparse Laneformer

    Authors: Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, **zhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum

    Abstract: Lane detection is a fundamental task in autonomous driving, and has achieved great progress as deep learning emerges. Previous anchor-based methods often design dense anchors, which highly depend on the training dataset and remain fixed during inference. We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse an… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  32. arXiv:2403.19976  [pdf, other

    cs.CV

    eTraM: Event-based Traffic Monitoring Dataset

    Authors: Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

    Abstract: Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields. However, their potential in static traffic monitoring remains largely unexplored. To facilitate this exploration, we present eTraM - a first-of-its-kind, fully event-based traffic monitoring dataset. eTraM offers 10 hr of data from different traffic scenarios in various li… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

  33. arXiv:2403.17694  [pdf, other

    cs.CV cs.GR eess.IV

    AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

    Authors: Huawei Wei, Zejun Yang, Zhisheng Wang

    Abstract: In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image. Our methodology is divided into two stages. Initially, we extract 3D intermediate representations from audio and project them into a sequence of 2D facial landmarks. Subsequently, we employ a robust diffusion model, coupled with a motion module, to convert… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  34. arXiv:2403.17094  [pdf, other

    cs.CV cs.LG

    SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving

    Authors: Yiming Xie, Henglu Wei, Zhenyi Liu, Xiaoyu Wang, Xiangyang Ji

    Abstract: To advance research in learning-based defogging algorithms, various synthetic fog datasets have been developed. However, existing datasets created using the Atmospheric Scattering Model (ASM) or real-time rendering engines often struggle to produce photo-realistic foggy images that accurately mimic the actual imaging process. This limitation hinders the effective generalization of models from synt… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  35. arXiv:2403.16875  [pdf, other

    cs.RO

    TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments

    Authors: Chen Yao, Yangtao Ge, Guowei Shi, Zirui Wang, Ningbo Yang, Zheng Zhu, Hexiang Wei, Yuntian Zhao, **g Wu, Zhenzhong Jia

    Abstract: Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Map** (SLAM), especially when confronting non-geometric hazards in demanding landscapes… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Robotics and Automation Letters

  36. arXiv:2403.15706  [pdf, other

    cs.LG cs.CV

    G-ACIL: Analytic Learning for Exemplar-Free Generalized Class Incremental Learning

    Authors: Hui** Zhuang, Yizhu Chen, Di Fang, Run He, Kai Tong, Hongxin Wei, Ziqian Zeng, Cen Chen

    Abstract: Class incremental learning (CIL) trains a network on sequential tasks with separated categories but suffers from catastrophic forgetting, where models quickly lose previously learned knowledge when acquiring new tasks. The generalized CIL (GCIL) aims to address the CIL problem in a more real-world scenario, where incoming data have mixed data categories and unknown sample size distribution, leadin… ▽ More

    Submitted 13 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  37. arXiv:2403.11639  [pdf, other

    cs.RO cs.CV

    An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

    Authors: Zewen Xu, Yijia He, Hao Wei, Bo Xu, BinJian Xie, Yihong Wu

    Abstract: Line features are valid complements for point features in man-made environments. 3D-2D constraints provided by line features have been widely used in Visual Odometry (VO) and Structure-from-Motion (SfM) systems. However, how to accurately solve three-view relative motion only with 2D observations of points and lines in real time has not been fully explored. In this paper, we propose a novel three-… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2403.10823  [pdf, other

    cs.CV cs.AI

    VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis

    Authors: Hao Wei, Bowen Liu, Minqing Zhang, Peilun Shi, Wu Yuan

    Abstract: Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned chal… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  39. arXiv:2403.09129  [pdf, other

    cs.GT

    All-pay Auction Based Profit Maximization in End-to-End Computation Offloading System

    Authors: Hai Xue, Yun Xia, Di Zhang, Honghua Wei, Xiaolong Xu

    Abstract: Pricing is an important issue in mobile edge computing. How to appropriately determine the bid of end user (EU) is an incentive factor for edge cloud (EC) to offer service. In this letter, we propose an equilibrium pricing scheme based on the all-pay auction model in end-to-end collaboration environment, wherein all EUs can acquire the service at a lower price than the own value of the required re… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  40. arXiv:2403.07323  [pdf, other

    eess.SP cs.NI

    Discrete-Time Modeling and Handover Analysis of Intelligent Reflecting Surface-Assisted Networks

    Authors: Hongtao Zhang, Haoyan Wei

    Abstract: Owning to the reflection gain and double path loss featured by intelligent reflecting surface (IRS) channels, handover (HO) locations become irregular and the signal strength fluctuates sharply with variations in IRS connections during HO, the risk of HO failures (HOFs) is exacerbated and thus HO parameters require reconfiguration. However, existing HO models only assume monotonic negative exponen… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures, submitted to IEEE

  41. arXiv:2403.06869  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Learning with Noisy Foundation Models

    Authors: Hao Chen, **dong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj

    Abstract: Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and ana… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 18 pages, 10 figures, 6 tables, preprint. arXiv admin note: substantial text overlap with arXiv:2309.17002

  42. arXiv:2403.06013  [pdf, other

    cs.LG cs.CV

    Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape

    Authors: Tie** Chen, Wenwang Huang, Linsey Pang, Dongsheng Luo, Hua Wei

    Abstract: This paper delves into the critical area of deep learning robustness, challenging the conventional belief that classification robustness and explanation robustness in image classification systems are inherently correlated. Through a novel evaluation approach leveraging clustering for efficient assessment of explanation robustness, we demonstrate that enhancing explanation robustness does not neces… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  43. arXiv:2403.04124  [pdf, other

    cs.AI

    Privacy-preserving Fine-tuning of Large Language Models through Flatness

    Authors: Tie** Chen, Longchao Da, Huixue Zhou, **zhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei

    Abstract: The privacy concerns associated with the use of Large Language Models (LLMs) have grown recently with the development of LLMs such as ChatGPT. Differential Privacy (DP) techniques are explored in existing work to mitigate their privacy risks at the cost of generalization degradation. Our paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off b… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024 SeT LLM Workshop

    ACM Class: I.2

  44. arXiv:2403.01079  [pdf, other

    cs.LG cs.AI

    Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

    Authors: Junxian Li, Bin Shi, Erfei Cui, Hua Wei, Qinghua Zheng

    Abstract: We study the challenging problem for inference tasks on large-scale graph datasets of Graph Neural Networks: huge time and memory consumption, and try to overcome it by reducing reliance on graph structure. Even though distilling graph knowledge to student MLP is an excellent idea, it faces two major problems of positional information loss and low generalization. To solve the problems, we propose… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 20 pages, with Appendix

  45. arXiv:2402.17143  [pdf, other

    cs.DS cs.LG

    Energy-Efficient Scheduling with Predictions

    Authors: Eric Balkanski, Noemie Perivier, Clifford Stein, Hao-Ting Wei

    Abstract: An important goal of modern scheduling systems is to efficiently manage power usage. In energy-efficient scheduling, the operating system controls the speed at which a machine is processing jobs with the dual objective of minimizing energy consumption and optimizing the quality of service cost of the resulting schedule. Since machine-learned predictions about future requests can often be learned f… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  46. arXiv:2402.15627  [pdf, other

    cs.LG cs.DC

    MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

    Authors: Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao , et al. (7 additional authors not shown)

    Abstract: We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  47. arXiv:2402.13503  [pdf, ps, other

    cs.IT math.CO

    Multiple-Error-Correcting Codes for Analog Computing on Resistive Crossbars

    Authors: Hengjia Wei, Ron M. Roth

    Abstract: Error-correcting codes over the real field are studied which can locate outlying computational errors when performing approximate computing of real vector--matrix multiplication on resistive crossbars. Prior work has concentrated on locating a single outlying error and, in this work, several classes of codes are presented which can handle multiple errors. It is first shown that one of the known co… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  48. arXiv:2402.13435  [pdf, other

    cs.IR cs.LG

    Learning to Retrieve for Job Matching

    Authors: Jianqiang Shen, Yuchin Juan, Shaobo Zhang, ** Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wen**g Zhang

    Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we d… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  49. arXiv:2402.13430  [pdf, other

    cs.LG cs.AI cs.SI

    LinkSAGE: Optimizing Job Matching Using Graph Neural Networks

    Authors: ** Liu, Haichao Wei, Xiaochen Hou, Jianqiang Shen, Shihai He, Kay Qianqi Shen, Zhujun Chen, Fedor Borisyuk, Daniel Hewlett, Liang Wu, Srikant Veeraraghavan, Alex Tsun, Chengming Jiang, Wen**g Zhang

    Abstract: We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merel… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  50. arXiv:2402.12683  [pdf, other

    cs.LG cs.CV math.ST

    TorchCP: A Library for Conformal Prediction based on PyTorch

    Authors: Hongxin Wei, Jianguo Huang

    Abstract: TorchCP is a Python toolbox for conformal prediction research on deep learning models. It contains various implementations for posthoc and training methods for classification and regression tasks (including multi-dimension output). TorchCP is built on PyTorch (Paszke et al., 2019) and leverages the advantages of matrix computation to provide concise and efficient inference implementations. The cod… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.