Skip to main content

Showing 1–50 of 144 results for author: Su, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18089  [pdf, other

    cs.SD cs.MM eess.AS

    A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

    Authors: Tzu-Yun Hung, Jui-Te Wu, Yu-Chia Kuo, Yo-Wei Hsiao, Ting-Wei Lin, Li Su

    Abstract: Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the syn… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 15 pages, 2 figures, 3 tables

  2. arXiv:2406.10580  [pdf, other

    cs.CV

    IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

    Authors: Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou

    Abstract: A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Technical report

  3. arXiv:2406.06375  [pdf, other

    cs.SD cs.AI eess.AS

    MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

    Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-** Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

    Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

  4. arXiv:2406.00276  [pdf

    cs.LG cs.AI cs.CE physics.data-an

    Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

    Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

    Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    ACM Class: J.2; G.3

  5. arXiv:2405.20810  [pdf, other

    cs.CV

    Context-aware Difference Distilling for Multi-change Captioning

    Authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang

    Abstract: Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences.… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 main conference (long paper)

  6. arXiv:2405.09592  [pdf, other

    cs.LG cs.AI cs.CE

    A Survey of Generative Techniques for Spatial-Temporal Data Mining

    Authors: Qianru Zhang, Haixin Wang, Cheng Long, Liangcai Su, Xingwei He, Jianlong Chang, Tailin Wu, Hongzhi Yin, Siu-Ming Yiu, Qi Tian, Christian S. Jensen

    Abstract: This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of g… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 19 pages

  7. arXiv:2405.03654  [pdf, other

    cs.CR cs.AI

    Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

    Authors: Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zi**g Fan, Xiaodan Zhang, Zhengwei Jiang

    Abstract: To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content securi… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.00263  [pdf, other

    cs.CL cs.AI cs.LG

    Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

    Authors: Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui

    Abstract: Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  9. arXiv:2404.13841  [pdf, other

    cs.LG cs.AI

    Fair Concurrent Training of Multiple Models in Federated Learning

    Authors: Marie Siew, Haoran Zhang, Jong-Ik Park, Yuezhou Liu, Yichen Ruan, Lili Su, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong

    Abstract: Federated learning (FL) enables collaborative learning across multiple clients. In most FL work, all clients train a single learning task. However, the recent proliferation of FL applications may increasingly require multiple FL tasks to be trained simultaneously, sharing clients' computing and communication resources, which we call Multiple-Model Federated Learning (MMFL). Current MMFL algorithms… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  10. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiao**g Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  11. arXiv:2404.10091  [pdf, other

    cs.DC cs.LG

    Empowering Federated Learning with Implicit Gossi**: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

    Authors: Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

    Abstract: Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communica… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: This is a substantial extension of the conference paper "Towards Bias Correction of Fedavg over Nonuniform and Time-varying Communications", which was published in 2023 62nd IEEE Conference on Decision and Control (CDC), DOI: 10.1109/CDC49753.2023.10383258

  12. HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images

    Authors: Shuaibing Wang, Shunli Wang, Dingkang Yang, Mingcheng Li, Ziyun Qian, Liuzhen Su, Lihua Zhang

    Abstract: We propose a robust and accurate method for reconstructing 3D hand mesh from monocular images. This is a very challenging problem, as hands are often severely occluded by objects. Previous works often have disregarded 2D hand pose information, which contains hand prior knowledge that is strongly correlated with occluded regions. Thus, in this work, we propose a novel 3D hand mesh reconstruction ne… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures, ICME-2023 conference paper

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023: 2495-2500

  13. arXiv:2403.07564  [pdf, other

    cs.CV

    RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

    Authors: Mingze Wang, Lili Su, Cilin Yan, Sheng Xu, Pengcheng Yuan, Xiaolong Jiang, Baochang Zhang

    Abstract: The intelligent interpretation of buildings plays a significant role in urban planning and management, macroeconomic analysis, population dynamics, etc. Remote sensing image building interpretation primarily encompasses building extraction and change detection. However, current methodologies often treat these two tasks as separate entities, thereby failing to leverage shared knowledge. Moreover, t… ▽ More

    Submitted 14 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  14. arXiv:2403.07390  [pdf, other

    eess.IV cs.CV

    Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution

    Authors: Haochen Sun, Yan Yuan, Lijuan Su, Haotian Shao

    Abstract: Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 16 pages

  15. arXiv:2402.15216  [pdf, ps, other

    cs.CV

    Label-efficient Multi-organ Segmentation Method with Diffusion Model

    Authors: Yongzhi Huang, **xin Zhu, Haseeb Hassan, Liyilei Su, **gyu Li, Binding Huang

    Abstract: Accurate segmentation of multiple organs in Computed Tomography (CT) images plays a vital role in computer-aided diagnosis systems. Various supervised-learning approaches have been proposed recently. However, these methods heavily depend on a large amount of high-quality labeled data, which is expensive to obtain in practice. In this study, we present a label-efficient learning approach using a pr… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  16. arXiv:2401.15842  [pdf

    cs.CV cs.AI

    LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering

    Authors: Yuhan Chen, Lumei Su, Lihua Chen, Zhiwei Lin

    Abstract: In this paper, the LCV2 modular method is proposed for the Grounded Visual Question Answering task in the vision-language multimodal domain. This approach relies on a frozen large language model (LLM) as intermediate mediator between the off-the-shelf VQA model and the off-the-shelf visual grounding (VG) model, where the LLM transforms and conveys textual information between the two modules based… ▽ More

    Submitted 22 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 21 pages,9 figures

  17. arXiv:2401.04996  [pdf, other

    cs.NI

    Distributed Experimental Design Networks

    Authors: Yuanyuan Li, Lili Su, Carlee Joe-Wong, Edmund Yeh, Stratis Ioannidis

    Abstract: As edge computing capabilities increase, model learning deployments in diverse edge environments have emerged. In experimental design networks, introduced recently, network routing and rate allocation are designed to aid the transfer of data from sensors to heterogeneous learners. We design efficient experimental design network algorithms that are (a) distributed and (b) use multicast transmission… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Technical report for paper accepted by INFOCOM 2024

  18. arXiv:2401.03177  [pdf, other

    cs.CV cs.CL

    Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks

    Authors: Qian Li, Lixin Su, Jiashu Zhao, Long Xia, Hengyi Cai, Suqi Cheng, Hengzhu Tang, Junfeng Wang, Dawei Yin

    Abstract: Text-video retrieval is a challenging task that aims to identify relevant videos given textual queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content. Previous works primarily focus on aligning the query and the video by finely aggregating word-frame matching… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  19. arXiv:2312.17156  [pdf, other

    cs.SD eess.AS

    BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer

    Authors: Chih-Cheng Chang, Li Su

    Abstract: Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scen… ▽ More

    Submitted 23 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  20. arXiv:2312.15450  [pdf, other

    cs.IR

    Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM

    Authors: Xiaopeng Li, Lixin Su, Pengyue Jia, Xiangyu Zhao, Suqi Cheng, Junfeng Wang, Dawei Yin

    Abstract: Search engines are crucial as they provide an efficient and easy way to access vast amounts of information on the internet for diverse information needs. User queries, even with a specific need, can differ significantly. Prior research has explored the resilience of ranking models against typical query variations like paraphrasing, misspellings, and order changes. Yet, these works overlook how div… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  21. arXiv:2312.12107  [pdf, other

    cs.DC cs.DB

    GraphScope Flex: LEGO-like Graph Computing Stack

    Authors: Tao He, Shuxian Hu, Longbin Lai, Dongze Li, Neng Li, Xue Li, Lexiao Liu, Xiaojian Luo, Binqing Lyu, Ke Meng, Sijie Shen, Li Su, Lei Wang, **gbo Xu, Wenyuan Yu, Weibin Zeng, Lei Zhang, Siyuan Zhang, **gren Zhou, Xiaoli Zhou, Diwen Zhu

    Abstract: Graph computing has become increasingly crucial in processing large-scale graph data, with numerous systems developed for this purpose. Two years ago, we introduced GraphScope as a system addressing a wide array of graph computing needs, including graph traversal, analytics, and learning in one system. Since its inception, GraphScope has achieved significant technological advancements and gained w… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  22. arXiv:2312.08852  [pdf, other

    cs.LG

    ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance

    Authors: Ling-Hao Chen, Yuanshuo Zhang, Taohua Huang, Liangcai Su, Zeyi Lin, Xi Xiao, Xiaobo Xia, Tongliang Liu

    Abstract: Deep learning has achieved remarkable success in graph-related tasks, yet this accomplishment heavily relies on large-scale high-quality annotated datasets. However, acquiring such datasets can be cost-prohibitive, leading to the practical use of labels obtained from economically efficient sources such as web searches and user tags. Unfortunately, these labels often come with noise, compromising t… ▽ More

    Submitted 8 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 24 pages, 14 figures, 15 tables and a project page at https://eraseai.github.io/ERASE-page

  23. arXiv:2312.03339  [pdf

    cs.CV

    PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization

    Authors: Xin Cao, Huan Xia, Xinxin Han, Yifan Wang, Kang Li, Linzhi Su

    Abstract: Most deep learning-based point cloud processing methods are supervised and require large scale of labeled data. However, manual labeling of point cloud data is laborious and time-consuming. Self-supervised representation learning can address the aforementioned issue by learning robust and generalized representations from unlabeled datasets. Nevertheless, the embedded features obtained by represent… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  24. arXiv:2311.18213  [pdf, other

    cs.IR cs.AI

    Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation

    Authors: Liangcai Su, Fan Yan, Jieming Zhu, Xi Xiao, Haoyi Duan, Zhou Zhao, Zhenhua Dong, Ruiming Tang

    Abstract: Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications. The success of two-tower matching attributes to its efficiency in retrieval among a large number of items, since the item tower can be precomputed and used for fast Approximate Nearest Neighbor (ANN) search. However, it suffers two main challenges, including limited f… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted by SIGIR 2023. Code will be available at https://reczoo.github.io/SparCode

  25. arXiv:2311.12488  [pdf, other

    eess.AS cs.SD

    Adapting pretrained speech model for Mandarin lyrics transcription and alignment

    Authors: Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang

    Abstract: The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by ASRU 2023

  26. arXiv:2311.00423  [pdf, other

    cs.IR

    LLMRec: Large Language Models with Graph Augmentation for Recommendation

    Authors: Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang

    Abstract: The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In… ▽ More

    Submitted 6 January, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: WSDM 2024 Oral Presentation

  27. arXiv:2310.20218  [pdf

    cs.LG cs.AI

    A Systematic Review for Transformer-based Long-term Series Forecasting

    Authors: Liyilei Su, Xumin Zuo, Rui Li, Xin Wang, Heng Zhao, Bingding Huang

    Abstract: The emergence of deep learning has yielded noteworthy advancements in time series forecasting (TSF). Transformer architectures, in particular, have witnessed broad utilization and adoption in TSF tasks. Transformers have proven to be the most successful solution to extract the semantic correlations among the elements within a long sequence. Various variants have enabled transformer architecture to… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  28. arXiv:2310.19198  [pdf

    q-bio.QM cs.LG eess.SP

    Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Map** and Cross Frequency Coupling

    Authors: Xiong Xiong, Li Su, **guo Huang, Guixia Kang

    Abstract: Objective: Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs), aiming to decoding motor intentions from electroencephalogram (EEG) signals. Method: Drawing inspiration from Riemannian geometry and Cross-Frequency Coupling (CFC), this paper introduces a novel approach termed Riemann Tangent Space Map** using Dichotomous Filter Bank wi… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 22 pages, 7 figures

  29. Representation Learning with Large Language Models for Recommendation

    Authors: Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang

    Abstract: Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover… ▽ More

    Submitted 25 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Published as a WWW'24 full paper

  30. arXiv:2310.15469  [pdf, other

    cs.CR cs.CL

    The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

    Authors: Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei **, Zihao Wang, Liya Su, Zhikun Zhang, XiaoFeng Wang, Haixu Tang

    Abstract: The rapid advancements of large language models (LLMs) have raised public concerns about the privacy leakage of personally identifiable information (PII) within their extensive training datasets. Recent studies have demonstrated that an adversary could extract highly sensitive privacy data from the training data of LLMs with carefully designed prompts. However, these attacks suffer from the model'… ▽ More

    Submitted 12 May, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  31. arXiv:2310.13023  [pdf, other

    cs.CL cs.AI

    GraphGPT: Graph Instruction Tuning for Large Language Models

    Authors: Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang

    Abstract: Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Traditional methods often depend on fine-tuning with task-specific labels, limiting their effectiveness when labeled data is scarce. Our research tackles this by advanc… ▽ More

    Submitted 7 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted by SIGIR'2024, full paper

  32. arXiv:2309.16716  [pdf, other

    cs.RO cs.AI

    Towards Safe Autonomy in Hybrid Traffic: Detecting Unpredictable Abnormal Behaviors of Human Drivers via Information Sharing

    Authors: Jiangwei Wang, Lili Su, Songyang Han, Dong** Song, Fei Miao

    Abstract: Hybrid traffic which involves both autonomous and human-driven vehicles would be the norm of the autonomous vehicles practice for a while. On the one hand, unlike autonomous vehicles, human-driven vehicles could exhibit sudden abnormal behaviors such as unpredictably switching to dangerous driving modes, putting its neighboring vehicles under risks; such undesired mode switching could arise from n… ▽ More

    Submitted 23 August, 2023; originally announced September 2023.

    Comments: accepted to ACM Transactions on Cyber-Physical Systems

  33. arXiv:2309.16487  [pdf, other

    cs.LG

    Towards Poisoning Fair Representations

    Authors: Tianci Liu, Haoyu Wang, Feijie Wu, Hengtong Zhang, Pan Li, Lu Su, **g Gao

    Abstract: Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream task… ▽ More

    Submitted 4 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  34. arXiv:2309.16283  [pdf, other

    cs.CV cs.CL

    Self-supervised Cross-view Representation Reconstruction for Change Captioning

    Authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang

    Abstract: Change captioning aims to describe the difference between a pair of similar images. Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. Concretely, we first design a multi-head token-wise matching to model relatio… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  35. arXiv:2309.11718  [pdf, other

    cs.CV

    CPR-Coach: Recognizing Composite Error Actions based on Single-class Training

    Authors: Shunli Wang, Qing Yu, Shuaibing Wang, Dingkang Yang, Liuzhen Su, Xiao Zhao, Haopeng Kuang, Peixuan Zhang, Peng Zhai, Lihua Zhang

    Abstract: The fine-grained medical action analysis task has received considerable attention from pattern recognition communities recently, but it faces the problems of data and algorithm shortage. Cardiopulmonary Resuscitation (CPR) is an essential skill in emergency treatment. Currently, the assessment of CPR skills mainly depends on dummies and trainers, leading to high training costs and low efficiency.… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    ACM Class: I.5.4

  36. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

  37. arXiv:2309.07846  [pdf, other

    cs.CV

    MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems

    Authors: Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu

    Abstract: Neural Radiance Fields (NeRF) use multi-view images for 3D scene representation, demonstrating remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods assume a unique camera and rarely consider multi-camera scenarios. Besides, some NeRF meth… ▽ More

    Submitted 22 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: This manuscript is currently under review

  38. arXiv:2308.13537  [pdf, other

    cs.IR cs.LG

    STEM: Unleashing the Power of Embeddings for Multi-task Recommendation

    Authors: Liangcai Su, Junwei Pan, Ximei Wang, Xi Xiao, Shijie Quan, Xihua Chen, Jie Jiang

    Abstract: Multi-task learning (MTL) has gained significant popularity in recommender systems as it enables simultaneous optimization of multiple objectives. A key challenge in MTL is negative transfer, but existing studies explored negative transfer on all samples, overlooking the inherent complexities within them. We split the samples according to the relative amount of positive feedback among tasks. Surpr… ▽ More

    Submitted 6 January, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

  39. arXiv:2308.06454  [pdf, other

    cs.CL

    Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension

    Authors: Leilei Su, Jian Chen, Yifan Peng, Cong Sun

    Abstract: Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model's capability to recognize biomedical entities in scenarios of few-shot learning. By redefining biomedical named entity recognitio… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  40. arXiv:2307.14952  [pdf, other

    cs.LG cs.DC cs.NI

    Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning

    Authors: Connor Mclaughlin, Matthew Ding, Denis Edogmus, Lili Su

    Abstract: As the network scale increases, existing fully distributed solutions start to lag behind the real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks. In this paper, we focus on hierarchical system architecture and address the problem of non-Bayesian learning over networks that are vulnerable to communication failures… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 11 pages, 1 figure

  41. arXiv:2306.17267  [pdf, other

    cs.LG

    Fast and Robust State Estimation and Tracking via Hierarchical Learning

    Authors: Connor Mclaughlin, Matthew Ding, Deniz Edogmus, Lili Su

    Abstract: Fully distributed estimation and tracking solutions to large-scale multi-agent networks suffer slow convergence and are vulnerable to network failures. In this paper, we aim to speed up the convergence and enhance the resilience of state estimation and tracking using a simple hierarchical system architecture wherein agents are clusters into smaller networks, and a parameter server exists to aid th… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 14 pages, 5 figures

  42. arXiv:2306.00280  [pdf, other

    cs.LG cs.DC stat.ML

    Towards Bias Correction of FedAvg over Nonuniform and Time-Varying Communications

    Authors: Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

    Abstract: Federated learning (FL) is a decentralized learning framework wherein a parameter server (PS) and a collection of clients collaboratively train a model via minimizing a global objective. Communication bandwidth is a scarce resource; in each round, the PS aggregates the updates from a subset of clients only. In this paper, we focus on non-convex minimization that is vulnerable to non-uniform and ti… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  43. arXiv:2305.20003  [pdf

    cs.LG eess.SY math.OC

    A Novel Black Box Process Quality Optimization Approach based on Hit Rate

    Authors: Yang Yang, Jian Wu, Xiangman Song, Derun Wu, Lijie Su, Lixin Tang

    Abstract: Hit rate is a key performance metric in predicting process product quality in integrated industrial processes. It represents the percentage of products accepted by downstream processes within a controlled range of quality. However, optimizing hit rate is a non-convex and challenging problem. To address this issue, we propose a data-driven quasi-convex approach that combines factorial hidden Markov… ▽ More

    Submitted 2 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  44. arXiv:2305.19971  [pdf, other

    cs.LG cs.DC

    Federated Learning in the Presence of Adversarial Client Unavailability

    Authors: Lili Su, Ming Xiang, Jiaming Xu, Pengkun Yang

    Abstract: Federated learning is a decentralized machine learning framework that enables collaborative model training without revealing raw data. Due to the diverse hardware and software limitations, a client may not always be available for the computation requests from the parameter server. An emerging line of research is devoted to tackling arbitrary client unavailability. However, existing work still impo… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

  45. arXiv:2305.19956  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images

    Authors: Hongxu Jiang, Muhammad Imran, Preethika Muralidharan, Anjali Patel, Jake Pensa, Muxuan Liang, Tarik Benidir, Joseph R. Grajo, Jason P. Joseph, Russell Terry, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao

    Abstract: Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging… ▽ More

    Submitted 25 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Journal ref: Computerized Medical Imaging and Graphics (2024): 102326

  46. arXiv:2305.19939  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Image Registration of In Vivo Micro-Ultrasound and Ex Vivo Pseudo-Whole Mount Histopathology Images of the Prostate: A Proof-of-Concept Study

    Authors: Muhammad Imran, Brianna Nguyen, Jake Pensa, Sara M. Falzarano, Anthony E. Sisk, Muxuan Liang, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao

    Abstract: Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effecti… ▽ More

    Submitted 16 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  47. arXiv:2305.16588  [pdf, other

    cs.DC

    Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training

    Authors: Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, **gren Zhou, Fei Wu

    Abstract: Graph neural network(GNN) has been widely applied in real-world applications, such as product recommendation in e-commerce platforms and risk control in financial management systems. Several cache-based GNN systems have been built to accelerate GNN training in a single machine with multiple GPUs. However, these systems fail to train billion-scale graphs efficiently, which is a common challenge in… ▽ More

    Submitted 12 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  48. arXiv:2304.07421  [pdf, other

    cs.LG cs.CV

    Peer-to-Peer Federated Continual Learning for Naturalistic Driving Action Recognition

    Authors: Liangqi Yuan, Yunsheng Ma, Lu Su, Ziran Wang

    Abstract: Naturalistic driving action recognition (NDAR) has proven to be an effective method for detecting driver distraction and reducing the risk of traffic accidents. However, the intrusive design of in-cabin cameras raises concerns about driver privacy. To address this issue, we propose a novel peer-to-peer (P2P) federated learning (FL) framework with continual learning, namely FedPC, which ensures pri… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023

  49. arXiv:2304.05917  [pdf, other

    cs.SD cs.LG eess.AS

    A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription

    Authors: Sangeon Yong, Li Su, Juhan Nam

    Abstract: Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music. However, due to the lack of high-quality labeled data, transcription of many instruments is still a challenging task. In particular, in the case of singing, it is difficult to find accurate notes due to its expressivene… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted at ICASSP 2023

  50. arXiv:2304.00902  [pdf, other

    cs.IR

    FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction

    Authors: Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong

    Abstract: Click-through rate (CTR) prediction is one of the fundamental tasks for online advertising and recommendation. While multi-layer perceptron (MLP) serves as a core component in many deep CTR prediction models, it has been widely recognized that applying a vanilla MLP network alone is inefficient in learning multiplicative feature interactions. As such, many two-stream interaction models (e.g., Deep… ▽ More

    Submitted 29 November, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted by AAAI 2023. Code available at https://reczoo.github.io/FinalMLP