Skip to main content

Showing 1–50 of 526 results for author: Shen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20078  [pdf, other

    cs.CV

    GM-DF: Generalized Multi-Scenario Deepfake Detection

    Authors: Yingxin Lai, Zitong Yu, **g Yang, Bin Li, Xiangui Kang, Linlin Shen

    Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.18146  [pdf, other

    cs.CV

    A Refer-and-Ground Multimodal Large Language Model for Biomedicine

    Authors: Xiaoshuang Huang, Haifeng Huang, Lingdong Shen, Yehui Yang, Fangxin Shang, Junwei Liu, Jia Liu

    Abstract: With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this chall… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI2024

  3. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  4. arXiv:2406.16518  [pdf

    cs.CV

    Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Convolutional neural networks (CNNs) and Transformers have shown advanced accuracy in crack detection under certain conditions. Yet, the fixed local attention can compromise the generalisation of CNNs, and the quadratic complexity of the global self-attention restricts the practical deployment of Transformers. Given the emergence of the new-generation architecture of Mamba, this paper proposes a V… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures

  5. arXiv:2406.14794  [pdf, other

    eess.IV cs.CV cs.LG

    ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

    Authors: Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

    Abstract: The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spat… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, **gyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

    Authors: Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

    Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4figures

  9. arXiv:2406.11637  [pdf, other

    cs.HC

    PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis

    Authors: Yue Yu, Leixian Shen, Fei Long, Huamin Qu, Hao Chen

    Abstract: Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python librar… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  10. arXiv:2406.10225  [pdf, other

    cs.CV

    SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

    Authors: Zhaoxu Luo, Bowen Song, Liyue Shen

    Abstract: During the acquisition of satellite images, there is generally a trade-off between spatial resolution and temporal resolution (acquisition frequency) due to the onboard sensors of satellite imaging systems. High-resolution satellite images are very important for land crop monitoring, urban planning, wildfire management and a variety of applications. It is a significant yet challenging task to achi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  11. arXiv:2406.10211  [pdf, other

    cs.CV

    DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

    Authors: Bowen Song, Jason Hu, Zhaoxu Luo, Jeffrey A. Fessler, Liyue Shen

    Abstract: Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  12. arXiv:2406.09900  [pdf, other

    cs.CL

    GEB-1.3B: Open Lightweight Large Language Model

    Authors: Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu

    Abstract: Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the ex… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: GEB-1.3B technical report

  13. arXiv:2406.09770  [pdf, other

    cs.LG cs.AI

    Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

    Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: code is available at https://github.com/tanganke/pareto_set_learning

  14. arXiv:2406.09643  [pdf, other

    cs.LG

    Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting

    Authors: Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen

    Abstract: Recurrent neural network-based sequence-to-sequence models have been extensively applied for multi-step-ahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. However, relying on self-generated predictions can lead to the rapid accumulation of errors over multiple steps, while using th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 12 pages,8 figures

  15. arXiv:2406.08372  [pdf, other

    cs.CV

    APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

    Authors: Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun

    Abstract: Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anyt… ▽ More

    Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 9 figures

  16. arXiv:2406.07971  [pdf, other

    cs.CL cs.AI cs.LG

    It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

    Authors: Taiming Lu, Lingfeng Shen, Xinyu Yang, Weiting Tan, Beidi Chen, Huaxiu Yao

    Abstract: Reinforcement Learning from Human Feedback (RLHF) involves training policy models (PMs) and reward models (RMs) to align language models with human preferences. Instead of focusing solely on PMs and RMs independently, we propose to examine their interactions during fine-tuning, introducing the concept of seamlessness. Our study starts with observing the saturation phenomenon, where continual impro… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.07890  [pdf, other

    eess.AS cs.CL cs.LG

    Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  18. arXiv:2406.04603  [pdf, ps, other

    cs.CV

    Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

    Authors: Xinquan Yang, Xuguang Li, Xiaoling Luo, Leilei Zeng, Yudi Zhang, Linlin Shen, Yongqiang Deng

    Abstract: Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: MICCAI'2024

  19. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  20. arXiv:2406.02462  [pdf, other

    cs.CV cs.AI

    Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems

    Authors: Jason Hu, Bowen Song, Xiaojian Xu, Liyue Shen, Jeffrey A. Fessler

    Abstract: Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  21. arXiv:2406.01417  [pdf, other

    cs.LG cs.CV

    Mixup Augmentation with Multiple Interpolations

    Authors: Lifeng Shen, **cheng Yu, Hansi Yang, James T. Kwok

    Abstract: Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  23. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  24. arXiv:2405.18187  [pdf, other

    cs.LG

    AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

    Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

    Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures, 4 tables

  25. arXiv:2405.18080  [pdf, other

    cs.LG

    HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  26. arXiv:2405.17876  [pdf, other

    cs.LG cs.DC math.OC

    Decentralized Directed Collaboration for Personalized Federated Learning

    Authors: Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen

    Abstract: Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and sy… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: CVPR 2024. arXiv admin note: text overlap with arXiv:2305.15157

  27. arXiv:2405.17098  [pdf, other

    cs.LG

    Q-value Regularized Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  28. arXiv:2405.17079  [pdf, other

    stat.ML cs.LG

    Learning with User-Level Local Differential Privacy

    Authors: Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

    Abstract: User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  29. arXiv:2405.16560  [pdf, other

    cs.LG

    Task Grou**s Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

    Authors: Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  30. arXiv:2405.13771  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Dataset Multi-Task Learning for COVID-19 Prognosis

    Authors: Filippo Ruffini, Lorenzo Tronchin, Zhuoru Wu, Wenting Chen, Paolo Soda, Linlin Shen, Valerio Guarrasi

    Abstract: In the fight against the COVID-19 pandemic, leveraging artificial intelligence to predict disease outcomes from chest radiographic images represents a significant scientific aim. The challenge, however, lies in the scarcity of large, labeled datasets with compatible tasks for training deep learning models without leading to overfitting. Addressing this issue, we introduce a novel multi-dataset mul… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  31. arXiv:2405.13694  [pdf, other

    cs.CV

    Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances

    Authors: Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han

    Abstract: Recent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction. Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynam… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, 6 figures

  32. arXiv:2405.13453  [pdf, other

    cs.LG cs.CR

    A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy

    Authors: Puning Zhao, Lifeng Lai, Li Shen, Qingming Li, Jiafei Wu, Zhe Liu

    Abstract: Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clip** samples into the interval. However, the clip** operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local samp… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  33. arXiv:2405.13274  [pdf, other

    cs.CL

    DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation

    Authors: Weiting Tan, **gyu Zhang, Lingfeng Shen, Daniel Khashabi, Philipp Koehn

    Abstract: Non-autoregressive Transformers (NATs) are recently applied in direct speech-to-speech translation systems, which convert speech across different languages without intermediate text data. Although NATs generate high-quality outputs and offer faster inference than autoregressive models, they tend to produce incoherent and repetitive results due to complex data distribution (e.g., acoustic and lingu… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  34. arXiv:2405.12447  [pdf, ps, other

    cs.CV

    EPL: Empirical Prototype Learning for Deep Face Recognition

    Authors: Weijia Fan, Jiajun Wen, Xi Jia, Linlin Shen, Jiancan Zhou, Qiufu Li

    Abstract: Prototype learning is widely used in face recognition, which takes the row vectors of coefficient matrix in the last linear layer of the feature extraction model as the prototypes for each class. When the prototypes are updated using the facial sample feature gradients in the model training, they are prone to being pulled away from the class center by the hard samples, resulting in decreased overa… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 16pages, 2 figures, 6 tables

  35. arXiv:2405.12218  [pdf, other

    cs.CV

    Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

    Authors: Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Project page: https://mvsgaussian.github.io/

  36. arXiv:2405.12094  [pdf, other

    cs.LG

    Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

    Authors: Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu, Mengzhu Wang, Shouling Ji, **cai Huang, Li Shen

    Abstract: Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a pr… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 20 pages, 8 figures

  37. arXiv:2405.11772  [pdf

    cs.HC

    On the Design and Study of an Installation for Office Workers to Amplify Temporal Diversity and Connection to Nature

    Authors: Josh Andres, Rodolfo Ocampo, Hannah R. Feldman, Louisa Shen, Charlton Hill, Caroline Pegram, Adrian Schmidt, Justin Shave, Brendan Wright

    Abstract: We present the design and user study of an installation for office workers, enabling moments of temporal diversity and connection to nature. The installation is a form of creative computing experience that departs from the traditional focus on office technologies for productivity. Drawing on neuroscience insights and the slowing effect of nature sounds on time perception, we created an immersive,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Paper to be published at ICCC'24

    Journal ref: The International Conference on Computational Creativity, ICCC 2024

  38. arXiv:2405.10626  [pdf, other

    cs.CL

    Dynamic data sampler for cross-language transfer learning in large language models

    Authors: Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou

    Abstract: Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by ICASSP 2024

  39. arXiv:2405.08550  [pdf, other

    cs.LG

    Learning Multi-Agent Communication from Graph Modeling Perspective

    Authors: Shengchao Hu, Li Shen, Ya Zhang, Dacheng Tao

    Abstract: In numerous artificial intelligence applications, the collaborative efforts of multiple intelligent agents are imperative for the successful attainment of target objectives. To enhance coordination among these agents, a distributed communication framework is often employed. However, information sharing among all agents proves to be resource-intensive, while the adoption of a manually pre-defined c… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Published at ICLR 2024

  40. "Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies

    Authors: Brennan Schaffner, Arjun Nitin Bhagoji, Siyuan Cheng, Jacqueline Mei, Jay L. Shen, Grace Wang, Marshini Chetty, Nick Feamster, Genevieve Lakier, Chenhao Tan

    Abstract: Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  41. arXiv:2405.04819  [pdf, other

    cs.CL cs.AI

    DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

    Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

    Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Under Review; Incorrect author name revised

  42. arXiv:2405.02954  [pdf, other

    cs.CV cs.LG

    Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

    Authors: Wenyu Zhang, Li Shen, Chuan-Sheng Foo

    Abstract: Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the sourc… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Extension of ICCV paper arXiv:2212.07585, submitted to IJCV

  43. arXiv:2405.00984  [pdf, other

    cs.LG cs.CV

    FREE: Faster and Better Data-Free Meta-Learning

    Authors: Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-tra… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  44. arXiv:2405.00244  [pdf, other

    cs.CV

    Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

    Authors: Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

    Abstract: As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This paper has been accepted by CVPR 2024

  45. arXiv:2404.15598  [pdf, other

    cs.LG cs.CR

    Federated Learning with Only Positive Labels by Exploring Label Correlations

    Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

  46. arXiv:2404.14828  [pdf, other

    cs.IT

    GLDPC-PC Codes: Channel Coding Towards 6G Communications

    Authors: Li Shen, Yongpeng Wu, Yin Xu, Xiaohu You, Xiqi Gao, Wenjun Zhang

    Abstract: The sixth generation (6G) wireless communication system will improve the key technical indicators by one to two orders of magnitude, and come with some new features. As a crucial technique to enhance the reliability and efficiency of data transmission, the next generation channel coding is not only required to satisfy the stringent requirements of 6G, but also expected to be backward compatible to… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE Communications Magazine

  47. arXiv:2404.06443  [pdf, other

    cs.CV

    Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

    Authors: Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen

    Abstract: Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  48. arXiv:2404.06258  [pdf

    cs.CV

    Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retainin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 24 pages, 13 figures

  49. arXiv:2404.05253  [pdf, other

    cs.CV

    CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement

    Authors: Xu Wu, XianXu Hou, Zhihui Lai, Jie Zhou, Ya-nan Zhang, Witold Pedrycz, Linlin Shen

    Abstract: Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refin… ▽ More

    Submitted 30 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 10 pages, 13 figures

  50. arXiv:2404.03854  [pdf, other

    cs.LG cs.CL cs.CV

    Align as Ideal: Cross-Modal Alignment Binding for Federated Medical Vision-Language Pre-training

    Authors: Zitao Shuai, Liyue Shen

    Abstract: Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for medical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for medical VLP while protecting data privacy. However, clien… ▽ More

    Submitted 24 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.