Skip to main content

Showing 1–50 of 742 results for author: Xiao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00033  [pdf, other

    q-bio.NC cs.AI

    Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction

    Authors: Youzhi Qu, Junfeng Xia, Xinyao Jian, Wendu Li, Kaining Peng, Zhichao Liang, Haiyan Wu, Quanying Liu

    Abstract: Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learnin… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  2. arXiv:2406.18833  [pdf, other

    cs.CE math.NA quant-ph

    Quantum annealing-based structural optimization with a multiplicative design update

    Authors: Naruethep Sukulthanasorn, Junsen Xiao, Koya Wagatsuma, Shuji Moriguchi, Kenjiro Terada

    Abstract: This paper presents a new structural design framework, developed based on iterative optimization via quantum annealing (QA). The novelty lies in its successful design update using an unknown design multiplier obtained by iteratively solving the optimization problems with QA. In addition, to align with density-based approaches in structural optimization, multipliers are multiplicative to represent… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.18540  [pdf, other

    cs.CV cs.CR

    Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing

    Authors: Yunlong Zhao, Xiaoheng Deng, Yi**g Liu, Xinjun Pei, Jiazhi Xia, Wei Chen

    Abstract: Model stealing (MS) involves querying and observing the output of a machine learning model to steal its capabilities. The quality of queried data is crucial, yet obtaining a large amount of real data for MS is often challenging. Recent works have reduced reliance on real data by using generative models. However, when high-dimensional query data is required, these methods are impractical due to the… ▽ More

    Submitted 18 May, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2406.18151  [pdf, other

    cs.CV

    SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery

    Authors: Jian Song, Hongruixuan Chen, Weihao Xuan, Junshi Xia, Naoto Yokoya

    Abstract: Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thu… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification

    Authors: Benjamin Hou, Sung-Won Lee, Jung-Min Lee, Christopher Koh, **g Xiao, Perry J. Pickhardt, Ronald M. Summers

    Abstract: Purpose: To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods: This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, N… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2406.15962  [pdf

    cs.LG cs.CR cs.ET

    Privacy Preserving Machine Learning for Electronic Health Records using Federated Learning and Differential Privacy

    Authors: Naif A. Ganadily, Han J. Xia

    Abstract: An Electronic Health Record (EHR) is an electronic database used by healthcare providers to store patients' medical records which may include diagnoses, treatments, costs, and other personal information. Machine learning (ML) algorithms can be used to extract and analyze patient data to improve patient care. Patient records contain highly sensitive information, such as social security numbers (SSN… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 5 pages, 12 figures

  7. arXiv:2406.13873  [pdf, other

    cs.AI

    A Pure Transformer Pretraining Framework on Text-attributed Graphs

    Authors: Yu Song, Haitao Mao, Jiachen Xiao, **gzhe Liu, Zhikai Chen, Wei **, Carl Yang, Jiliang Tang, Hui Liu

    Abstract: Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  9. arXiv:2406.12238  [pdf, other

    cs.CL

    PFID: Privacy First Inference Delegation Framework for LLMs

    Authors: Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, **g Xiao

    Abstract: This paper introduces a novel privacy-preservation framework named PFID for LLMs that addresses critical privacy concerns by localizing user data through model sharding and singular value decomposition. When users are interacting with LLM systems, their prompts could be subject to being exposed to eavesdroppers within or outside LLM system providers who are interested in collecting users' input. I… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Submitted to EMNLP2024

  10. arXiv:2406.11906  [pdf, other

    q-bio.QM cs.AI

    NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

    Authors: **gbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

    Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.11189  [pdf, other

    cs.CV

    Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation

    Authors: Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao

    Abstract: Weakly supervised semantic segmentation has witnessed great achievements with image-level labels. Several recent approaches use the CLIP model to generate pseudo labels for training an individual segmentation model, while there is no attempt to apply the CLIP model as the backbone to directly segment objects with image-level labels. In this paper, we propose WeCLIP, a CLIP-based single-stage pipel… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3796-3806) 2024

  12. arXiv:2406.10981  [pdf, other

    cs.CV

    ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

    Authors: Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chun** Wang, Jun Xiao

    Abstract: With the advance of diffusion models, today's video generation has achieved impressive quality. But generating temporal consistent long videos is still challenging. A majority of video diffusion models (VDMs) generate long videos in an autoregressive manner, i.e., generating subsequent clips conditioned on last frames of previous clip. However, existing approaches all involve bidirectional computa… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Code will be available at https://github.com/Dawn-LX/Causal-VideoGen

  13. arXiv:2406.10928  [pdf, other

    cs.CR cs.AI cs.NI

    Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

    Authors: **gyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv

    Abstract: Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  14. arXiv:2406.10869  [pdf, other

    eess.IV cs.CV

    Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

    Authors: Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guo** Qiu

    Abstract: As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI sup… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages, 12 figures, journal

  15. arXiv:2406.09229  [pdf, other

    cs.CV

    MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

    Authors: Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

    Abstract: Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruct… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 IEEE International Conference on Image Processing

  16. Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    Authors: **gyuan Xia, Zhixiong Yang, Shengxi Li, Shuanghui Zhang, Yaowen Fu, Deniz Gündüz, Xiang Li

    Abstract: Learning-based approaches have witnessed great successes in blind single image super-resolution (SISR) tasks, however, handcrafted kernel priors and learning based kernel priors are typically required. In this paper, we propose a Meta-learning and Markov Chain Monte Carlo (MCMC) based SISR approach to learn kernel priors from organized randomness. In concrete, a lightweight network is adopted as k… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)

  17. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, **g Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  18. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  19. arXiv:2406.07413  [pdf, other

    cs.LG

    Holistic Memory Diversification for Incremental Learning in Growing Graphs

    Authors: Ziyue Qiao, Junren Xiao, Qingqiang Sun, Meng Xiao, Hui Xiong

    Abstract: This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  20. arXiv:2406.05615  [pdf, other

    cs.CL

    Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

    Authors: Thong Nguyen, Yi Bin, Junbin Xiao, Leigang Qu, Yicong Li, Jay Zhangjie Wu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with te… ▽ More

    Submitted 1 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 (Findings)

  21. arXiv:2406.05565  [pdf, other

    cs.CV

    Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

    Authors: Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

    Abstract: This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treati… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  22. arXiv:2406.05372  [pdf, ps, other

    stat.ML cs.LG

    Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

    Authors: Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su

    Abstract: Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfa… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  23. arXiv:2406.02976  [pdf, other

    cs.CV cs.AI

    DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

    Authors: Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, Yipeng Liu

    Abstract: Cooperation between temporal convolutional networks (TCN) and graph convolutional networks (GCN) as a processing module has shown promising results in skeleton-based video anomaly detection (SVAD). However, to maintain a lightweight model with low computational and storage complexity, shallow GCN and TCN blocks are constrained by small receptive fields and a lack of cross-dimension interaction cap… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  24. arXiv:2405.20603  [pdf

    cs.LG cs.AI

    Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis

    Authors: Ke Xu, Yu Cheng, Shiqing Long, Junjie Guo, Jue Xiao, Mengfang Sun

    Abstract: This paper focuses on the application and optimization of LSTM model in financial risk prediction. The study starts with an overview of the architecture and algorithm foundation of LSTM, and then details the model training process and hyperparameter tuning strategy, and adjusts network parameters through experiments to improve performance. Comparative experiments show that the optimized LSTM model… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  25. arXiv:2405.20561  [pdf, other

    cs.CR cs.SE

    All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts

    Authors: Tianle Sun, Ningyu He, Jiang Xiao, Yinliang Yue, Xiapu Luo, Haoyu Wang

    Abstract: In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the voi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by USENIX Security 2024

  26. arXiv:2405.20470  [pdf, other

    cs.RO cs.CV

    STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery

    Authors: Jiuhong Xiao, Ning Zhang, Daniel Tortei, Giuseppe Loianno

    Abstract: Accurate geo-localization of Unmanned Aerial Vehicles (UAVs) is crucial for a variety of outdoor applications including search and rescue operations, power line inspections, and environmental monitoring. The vulnerability of Global Navigation Satellite Systems (GNSS) signals to interference and spoofing necessitates the development of additional robust localization methods for autonomous navigatio… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 8 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2405.17900  [pdf, other

    cs.CL

    Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

    Authors: Haoxiang Shi, Xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, **g Xiao, Jianzong Wang

    Abstract: The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared informa… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by the 20th International Conference on Intelligent Computing (ICIC 2024)

  28. arXiv:2405.17777  [pdf, other

    cs.IR

    RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

    Authors: Jianzong Wang, Haoxiang Shi, Kaiyi Luo, Xulong Zhang, Ning Cheng, **g Xiao

    Abstract: Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one map** between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing techniq… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 20th International Conference on Intelligent Computing (ICIC 2024)

  29. arXiv:2405.17028  [pdf, other

    cs.SD eess.AS

    RSET: Remap**-based Sorting Method for Emotion Transfer Speech Synthesis

    Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, **g Xiao

    Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in develo** emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

  30. arXiv:2405.17016  [pdf, other

    cs.CV

    $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

    Authors: Weiquan Wang, Jun Xiao, Chun** Wang, Wei Liu, Zhao Wang, Long Chen

    Abstract: Continuous diffusion models have demonstrated their effectiveness in addressing the inherent uncertainty and indeterminacy in monocular 3D human pose estimation (HPE). Despite their strengths, the need for large search spaces and the corresponding demand for substantial training data make these models prone to generating biomechanically unrealistic poses. This challenge is particularly noticeable… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  31. arXiv:2405.16867  [pdf, other

    cs.RO cs.AI

    Clustering-based Learning for UAV Tracking and Pose Estimation

    Authors: Phumrapee Pisutsin, Cheng Wen Tsao, Mir Feroskhan

    Abstract: UAV tracking and pose estimation plays an imperative role in various UAV-related missions, such as formation control and anti-UAV measures. Accurately detecting and tracking UAVs in a 3D space remains a particularly challenging problem, as it requires extracting sparse features of micro UAVs from different flight environments and continuously matching correspondences, especially during agile fligh… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Submitted Report of CVPR 2024 UG2+ Challenge Track 5

  32. arXiv:2405.16455  [pdf, other

    stat.ML cs.LG stat.ME

    On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

    Authors: Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su

    Abstract: Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its K… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  33. arXiv:2405.16113  [pdf, other

    cs.LG

    Enabling On-Device Learning via Experience Replay with Efficient Dataset Condensation

    Authors: Gelei Xu, Ningzhi Tang, Jun Xia, Wei **, Yiyu Shi

    Abstract: Upon deployment to edge devices, it is often desirable for a model to further learn from streaming data to improve accuracy. However, extracting representative features from such data is challenging because it is typically unlabeled, non-independent and identically distributed (non-i.i.d), and is seen only once. To mitigate this issue, a common strategy is to maintain a small data buffer on the ed… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 pages, 10 figures

  34. arXiv:2405.15388  [pdf, other

    cs.AI cs.RO

    Language-Driven Interactive Traffic Trajectory Generation

    Authors: Junkai Xia, Chenxin Xu, Qingyao Xu, Chen Xie, Yanfeng Wang, Siheng Chen

    Abstract: Realistic trajectory generation with natural language control is pivotal for advancing autonomous vehicle technology. However, previous methods focus on individual traffic participant trajectory generation, thus failing to account for the complexity of interactive traffic dynamics. In this work, we propose InteractTraj, the first language-driven traffic trajectory generator that can generate inter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  35. arXiv:2405.15188  [pdf, other

    cs.CV

    PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction

    Authors: Bingchen Yang, Haiyong Jiang, Hao Pan, Peter Wonka, Jun Xiao, Guosheng Lin

    Abstract: Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one ste… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.14201  [pdf, other

    cs.CV

    FreeTuner: Any Subject in Any Style with Training-free Diffusion

    Authors: Youcan Xu, Zhen Wang, Jun Xiao, Wei Liu, Long Chen

    Abstract: With the advance of diffusion models, various personalized image generation methods have been proposed. However, almost all existing work only focuses on either subject-driven or style-driven personalization. Meanwhile, state-of-the-art methods face several challenges in realizing compositional personalization, i.e., composing different subject and style concepts, such as concept disentanglement,… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.13445  [pdf, other

    cs.LG cs.AI

    Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training

    Authors: Zhiyuan Wang, Bokui Chen, Xiaoyang Qu, Zhenhou Hong, **g Xiao, Jianzong Wang

    Abstract: With the rapid advancements in artificial intelligence, the development of knowledgeable and personalized agents has become increasingly prevalent. However, the inherent variability in state variables and action spaces among personalized agents poses significant aggregation challenges for traditional federated learning algorithms. To tackle these challenges, we introduce the Federated Split Decisi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  38. arXiv:2405.08183  [pdf, other

    cs.LG cs.AI

    Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

    Authors: Jun Xia, Yiyu Shi

    Abstract: Although Federated Learning (FL) is promising in knowledge sharing for heterogeneous Artificial Intelligence of Thing (AIoT) devices, their training performance and energy efficacy are severely restricted in practical battery-driven scenarios due to the ``wooden barrel effect'' caused by the mismatch between homogeneous model paradigms and heterogeneous device capability. As a result, due to vario… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  39. arXiv:2405.07257  [pdf, other

    cs.CV

    Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

    Authors: Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, **g Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

    Abstract: Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and ca… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    ACM Class: I.4.5; I.4.9

  40. A Multi-Channel Spatial-Temporal Transformer Model for Traffic Flow Forecasting

    Authors: Jianli Xiao, Baichao Long

    Abstract: Traffic flow forecasting is a crucial task in transportation management and planning. The main challenges for traffic flow forecasting are that (1) as the length of prediction time increases, the accuracy of prediction will decrease; (2) the predicted results greatly rely on the extraction of temporal and spatial dependencies from the road networks. To overcome the challenges mentioned above, we p… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Journal ref: Xiao J, Long B. A Multi-Channel Spatial-Temporal Transformer Model for Traffic Flow Forecasting[J]. Information Sciences, 2024: 120648

  41. arXiv:2405.03228  [pdf, other

    cs.LG

    TED: Accelerate Model Training by Internal Generalization

    Authors: **ying Xiao, ** Li, Jie Nie

    Abstract: Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes. We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data while fitting retained data, known as Internal Genera… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  42. arXiv:2405.03110  [pdf, other

    cs.IR

    Vector Quantization for Recommender Systems: A Review and Outlook

    Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  43. arXiv:2405.02935  [pdf, other

    cs.CL

    Enabling Patient-side Disease Prediction via the Integration of Patient Narratives

    Authors: Zhixiang Su, Yinan Zhang, Jiazheng **g, Jie Xiao, Zhiqi Shen

    Abstract: Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  44. arXiv:2405.01817  [pdf, other

    cs.LG

    Uniformly Stable Algorithms for Adversarial Training and Beyond

    Authors: Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, Asuman Ozdaglar

    Abstract: In adversarial machine learning, neural networks suffer from a significant issue known as robust overfitting, where the robust test accuracy decreases over epochs (Rice et al., 2020). Recent research conducted by Xing et al.,2021; Xiao et al., 2022 has focused on studying the uniform stability of adversarial training. Their investigations revealed that SGD-based adversarial training fails to exhib… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  45. arXiv:2405.01242  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

    Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

    Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  46. arXiv:2405.00930  [pdf, other

    cs.SD eess.AS

    MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, **g Xiao, Ning Cheng

    Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  47. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, **g Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  48. arXiv:2405.00577  [pdf

    cs.LG eess.SP q-bio.NC

    Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review

    Authors: Yi Hao Chan, Deepank Girish, Sukrit Gupta, **g Xia, Chockalingam Kasi, Yinan He, Conghao Wang, Jagath C. Rajapakse

    Abstract: Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  49. arXiv:2404.19316  [pdf, other

    cs.CL

    QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering

    Authors: Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, Ziqi Liang, Xulong Zhang, Ning Cheng, **g Xiao

    Abstract: Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs. Our work introduces a novel approach, called the ``Query Latent Semantic Calibrator (QLSC)'', designed as an auxiliary module for existing MRC models. We propose a unique scaling strategy to capture latent semantic center features of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  50. arXiv:2404.19214  [pdf, other

    cs.SD eess.AS

    EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

    Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, **g Xiao

    Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)