Skip to main content

Showing 1–50 of 635 results for author: Yan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.18546  [pdf

    cs.CV cs.AI

    Application of Multimodal Fusion Deep Learning Model in Disease Recognition

    Authors: Xiaoyi Liu, Hongjie Qiu, Muqing Li, Zhou Yu, Yutian Yang, Yafeng Yan

    Abstract: This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  3. arXiv:2406.17287  [pdf, other

    cs.CL cs.AI

    Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

    Authors: Yang Yan, Lizhi Ma, Anqi Li, **gsong Ma, Zhenzhong Lan

    Abstract: Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play an… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16170  [pdf, other

    cs.IR cs.AI

    SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

    Authors: Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

    Abstract: The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.16069  [pdf, other

    cs.CL cs.AI

    FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

    Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

    Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, **g Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.11193  [pdf, other

    cs.CL

    MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  8. arXiv:2406.09923  [pdf, other

    cs.CL cs.AI cs.LG

    CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

    Authors: Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei **, Timothy S Chang, Wei Wang

    Abstract: The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care. While LLMs have shown some promise in the medical domain, their application in clinical diagnosis remains underexplored, especially in real-world clinical practice, where highly sophis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://clibench.github.io

  9. arXiv:2406.09682  [pdf, other

    cs.CR

    Privacy-preserving Quantification of Non-IID Degree in Federated Learning

    Authors: Yu** Yan, Yizhi Wang, Yingchao Yu, Yaochu **

    Abstract: Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data. However, the existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL, leading to a sharp drop in accuracy, reduced efficiency, and hindered implementation. To address the non-IID… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages, 8 figures, FL@FM-IJCAI'24

  10. arXiv:2406.09680  [pdf, other

    cs.LG cs.DC

    Heterogeneous Federated Learning with Convolutional and Spiking Neural Networks

    Authors: Yingchao Yu, Yu** Yan, Jisong Cai, Yaochu **

    Abstract: Federated learning (FL) has emerged as a promising paradigm for training models on decentralized data while safeguarding data privacy. Most existing FL systems, however, assume that all machine learning models are of the same type, although it becomes more likely that different edge devices adopt different types of AI models, including both conventional analogue artificial neural networks (ANNs) a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, FL@FM-IJCAI'24

  11. arXiv:2406.08903  [pdf, other

    cs.CL

    Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

    Authors: Bowen **, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

    Abstract: Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 12 pages

  12. arXiv:2406.08845  [pdf, other

    cs.CV

    Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

    Authors: Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, ** Luo, Kaipeng Zhang

    Abstract: Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. H… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  13. GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

    Authors: Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim

    Abstract: In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. H… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGIR 2024. arXiv admin note: text overlap with arXiv:2303.11700 by other authors

    ACM Class: H.3.3

  14. arXiv:2406.08196  [pdf, other

    cs.SD eess.AS

    FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

    Authors: Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 5 figures

  15. arXiv:2406.07327  [pdf, other

    cs.AI cs.CL cs.LG

    3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

    Authors: Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan

    Abstract: Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples. Despite the efficiency, DPO has rarely be used in the state-of-the-art production-level LLMs, implying its potential pathologies. In this work, we revisit DPO with a comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.06818  [pdf, other

    cs.LG

    Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

    Authors: Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan Rao Doppa, Yan Yan

    Abstract: Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerba… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  17. arXiv:2406.04601  [pdf, other

    cs.LG

    Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

    Authors: Zheng Huang, Qihui Yang, Dawei Zhou, Yujun Yan

    Abstract: Although most graph neural networks (GNNs) can operate on graphs of any size, their classification performance often declines on graphs larger than those encountered during training. Existing methods insufficiently address the removal of size information from graph representations, resulting in sub-optimal performance and reliance on backbone models. In response, we propose DISGEN, a novel and mod… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  18. arXiv:2406.04025  [pdf

    cs.CL

    The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

    Authors: Caimei Yang, Qihang Yang, Xingzhi Su, Chenxi Fu, Xiaoyi Wang, Ying Yan, Zaijiang Man

    Abstract: There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  19. arXiv:2406.02856  [pdf, other

    cs.CL cs.AI

    Xmodel-LM Technical Report

    Authors: Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang

    Abstract: We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01388  [pdf, other

    cs.CV

    AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

    Authors: Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subject… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Multi-turn interactive image generation

  21. arXiv:2406.00616  [pdf, other

    cs.DB

    EMIT: Micro-Invasive Database Configuration Tuning

    Authors: Jian Geng, Hongzhi Wang, Yu Yan

    Abstract: The process of database knob tuning has always been a challenging task. Recently, database knob tuning methods has emerged as a promising solution to mitigate these issues. However, these methods still face certain limitations.On one hand, when applying knob tuning algorithms to optimize databases in practice, it either requires frequent updates to the database or necessitates acquiring database w… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  22. arXiv:2406.00440  [pdf, other

    cs.CV

    Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

    Authors: Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

    Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  23. arXiv:2406.00133  [pdf, other

    cs.LG cs.AI

    Streamflow Prediction with Uncertainty Quantification for Water Management: A Constrained Reasoning and Learning Approach

    Authors: Mohammed Amine Gharsallaoui, Bhupinderjeet Singh, Supriya Savalkar, Aryan Deshwal, Yan Yan, Ananth Kalyanaraman, Kirti Rajagopalan, Janardhan Rao Doppa

    Abstract: Predicting the spatiotemporal variation in streamflow along with uncertainty quantification enables decision-making for sustainable management of scarce water resources. Process-based hydrological models (aka physics-based models) are based on physical laws, but using simplifying assumptions which can lead to poor accuracy. Data-driven approaches offer a powerful alternative, but they require larg… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  24. arXiv:2405.19590  [pdf, other

    cs.LG

    Weights Augmentation: it has never ever ever ever let her model down

    Authors: Junbin Zhuang, Guiguang Din, Yunyi Yan

    Abstract: Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss func… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  25. arXiv:2405.19363  [pdf, other

    eess.SP cs.AI cs.LG

    Medformer: A Multi-Granularity Patching Transformer for Medical Time-Series Classification

    Authors: Yihe Wang, Nan Huang, Taida Li, Yujun Yan, Xiang Zhang

    Abstract: Medical time series data, such as Electroencephalography (EEG) and Electrocardiography (ECG), play a crucial role in healthcare, such as diagnosing brain and heart diseases. Existing methods for medical time series classification primarily rely on handcrafted biomarkers extraction and CNN-based models, with limited exploration of transformers tailored for medical time series. In this paper, we int… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 20pages (14 pages main paper + 6 pages supplementary materials)

  26. arXiv:2405.19203  [pdf, other

    cs.CV

    $E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

    Authors: Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang

    Abstract: This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we pr… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: https://olivia23333.github.io/E3Gen

  27. arXiv:2405.18776  [pdf, other

    cs.CR cs.CL cs.LG

    LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

    Authors: Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish Kundu, Kai Shu, Yan Yan, Yuan Hong

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $ε< 3$). To address such limitations… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 18 pages, 15 figures

  28. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  29. arXiv:2405.18295  [pdf, other

    cs.CV

    Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

    Authors: Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

    Abstract: In real-life scenarios, humans seek out objects in the 3D world to fulfill their daily needs or intentions. This inspires us to introduce 3D intention grounding, a new task in 3D object detection employing RGB-D, based on human intention, such as "I want something to support my back". Closely related, 3D visual grounding focuses on understanding human reference. To achieve detection based on human… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  31. arXiv:2405.17460  [pdf

    cs.LG cs.AI cs.CV

    Investigation of Customized Medical Decision Algorithms Utilizing Graph Neural Networks

    Authors: Yafeng Yan, Shuyao He, Zhou Yu, Jiajie Yuan, Ziang Liu, Yan Chen

    Abstract: Aiming at the limitations of traditional medical decision system in processing large-scale heterogeneous medical data and realizing highly personalized recommendation, this paper introduces a personalized medical decision algorithm utilizing graph neural network (GNN). This research innovatively integrates graph neural network technology into the medical and health field, aiming to build a high-pr… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  32. arXiv:2405.17138  [pdf, other

    cs.DB

    CMOSS: A Reliable, Motif-based Columnar Molecular Storage System

    Authors: Eugenio Marinelli, Yiqing Yan, Virginie Magnone, Pascal Barbry, Raja Appuswamy

    Abstract: The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Despite its benefits, storing data on DNA poses several challenges as the technology used for reading/writing data and achieving random access on DNA are highly error prone. In order to deal… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  33. arXiv:2405.16964  [pdf, other

    cs.CL cs.AI

    Exploring the LLM Journey from Cognition to Expression with Linear Representations

    Authors: Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan

    Abstract: This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretrai… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published in ICML 2024

  34. arXiv:2405.16005  [pdf, other

    cs.CV

    PTQ4DiT: Post-training Quantization for Diffusion Transformers

    Authors: Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

    Abstract: The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computa… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  35. arXiv:2405.14861  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

    Authors: Gen Li, Yuling Yan

    Abstract: This paper investigates score-based diffusion models when the underlying target distribution is concentrated on or near low-dimensional manifolds within the higher-dimensional space in which they formally reside, a common characteristic of natural image distributions. Despite previous efforts to understand the data generation process of diffusion models, existing theoretical support remains highly… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.14136  [pdf, other

    cs.CV

    Efficient Multitask Dense Predictor via Binarization

    Authors: Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan

    Abstract: Multi-task learning for dense prediction has emerged as a pivotal area in computer vision, enabling simultaneous processing of diverse yet interrelated pixel-wise prediction tasks. However, the substantial computational demands of state-of-the-art (SoTA) models often limit their widespread deployment. This paper addresses this challenge by introducing network binarization to compress resource-inte… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR'2024

  37. arXiv:2405.14135  [pdf, other

    cs.LG cs.AI

    Learning Geospatial Region Embedding with Heterogeneous Graph

    Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

    Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  38. arXiv:2405.12512  [pdf, other

    cs.CV cs.AI cs.MM

    Rethink Predicting the Optical Flow with the Kinetics Perspective

    Authors: Yuhao Cheng, Siru Zhang, Yiqiang Yan

    Abstract: Optical flow estimation is one of the fundamental tasks in low-level computer vision, which describes the pixel-wise displacement and can be used in many other tasks. From the apparent aspect, the optical flow can be viewed as the correlation between the pixels in consecutive frames, so continuously refining the correlation volume can achieve an outstanding performance. However, it will make the m… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  39. arXiv:2405.11205  [pdf, other

    cs.CV

    Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation

    Authors: Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, **g Liu

    Abstract: Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation. Previous methods typically rely on single-modality features, such as vision or language features, to guide the multi-modal fusion process. However, this approach limits the interaction between vision and language, leading to a lack of… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures ICIC2024

  40. arXiv:2405.10640  [pdf, other

    cs.SI

    COMET: NFT Price Prediction with Wallet Profiling

    Authors: Tianfu Wang, Liwei Deng, Chao Wang, Jianxun Lian, Yue Yan, Nicholas **g Yuan, Qi Zhang, Hui Xiong

    Abstract: As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024 (ADS Track)

  41. arXiv:2405.08886  [pdf, other

    cs.LG stat.ML

    The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

    Authors: Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

    Abstract: In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: ICML2024

  42. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  43. arXiv:2405.08726  [pdf, other

    cs.RO cs.AI

    I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning

    Authors: Yashuai Yan, Esteve Valls Mascaro, Tobias Egle, Dongheui Lee

    Abstract: This paper addresses the critical need for refining robot motions that, despite achieving a high visual similarity through human-to-humanoid retargeting methods, fall short of practical execution in the physical realm. Existing techniques in the graphics community often prioritize visual fidelity over physics-based feasibility, posing a significant challenge for deploying bipedal systems in practi… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  44. arXiv:2405.04859  [pdf, other

    cs.RO

    Guarding Force: Safety-Critical Compliant Control for Robot-Environment Interaction

    Authors: Xinming Wang, Jun Yang, Jianliang Mao, **zhuo Liang, Shihua Li, Yunda Yan

    Abstract: In this study, we propose a safety-critical compliant control strategy designed to strictly enforce interaction force constraints during the physical interaction of robots with unknown environments. The interaction force constraint is interpreted as a new force-constrained control barrier function (FC-CBF) by exploiting the generalized contact model and the prior information of the environment, i.… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  45. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  46. arXiv:2405.04111  [pdf, other

    cs.LG eess.SP

    Adaptive Least Mean pth Power Graph Neural Networks

    Authors: Changran Peng, Yi Yan, Ercan E. Kuruoglu

    Abstract: In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  47. arXiv:2405.04098  [pdf, other

    cs.LG eess.SP

    Binarized Simplicial Convolutional Neural Networks

    Authors: Yi Yan, Ercan E. Kuruoglu

    Abstract: Graph Neural Networks have a limitation of solely processing features on graph nodes, neglecting data on high-dimensional structures such as edges and triangles. Simplicial Convolutional Neural Networks (SCNN) represent higher-order structures using simplicial complexes to break this limitation albeit still lacking time efficiency. In this paper, we propose a novel neural network architecture on s… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  48. arXiv:2405.03342  [pdf, other

    cs.LG

    Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

    Authors: Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, Zhifeng Hao

    Abstract: Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generatio… ▽ More

    Submitted 17 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  49. arXiv:2405.02191  [pdf

    cs.CV cs.LG eess.IV

    Non-Destructive Peat Analysis using Hyperspectral Imaging and Machine Learning

    Authors: Yijun Yan, **chang Ren, Barry Harrison, Oliver Lewis, Yinhe Li, ** Ma

    Abstract: Peat, a crucial component in whisky production, imparts distinctive and irreplaceable flavours to the final product. However, the extraction of peat disrupts ancient ecosystems and releases significant amounts of carbon, contributing to climate change. This paper aims to address this issue by conducting a feasibility study on enhancing peat use efficiency in whisky manufacturing through non-destru… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 4 pages,4 figures

  50. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.