Skip to main content

Showing 1–50 of 139 results for author: Fan, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  2. arXiv:2406.03865  [pdf, other

    cs.CV cs.AI

    Semantic Similarity Score for Measuring Visual Similarity at Semantic Level

    Authors: Senran Fan, Zhicheng Bao, Chen Dong, Haotai Liang, Xiaodong Xu, ** Zhang

    Abstract: Semantic communication, as a revolutionary communication architecture, is considered a promising novel communication paradigm. Unlike traditional symbol-based error-free communication systems, semantic-based visual communication systems extract, compress, transmit, and reconstruct images at the semantic level. However, widely used image similarity evaluation metrics, whether pixel-based MSE or PSN… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.03287  [pdf, other

    cs.NE cs.CL cs.LG

    SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

    Authors: Xingrun Xing, Zheng Zhang, Ziyi Ni, Shitao Xiao, Yiming Ju, Siqi Fan, Yequan Wang, Jiajun Zhang, Guoqi Li

    Abstract: Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2406.02191  [pdf, other

    stat.ML cs.LG

    On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

    Authors: Shunxing Fan, Mingming Gong, Kun Zhang

    Abstract: We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such insta… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  5. arXiv:2406.02002  [pdf, other

    cs.CL cs.AI

    Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

    Authors: Shixuan Fan, Wei Wei, Wendi Li, Xian-Ling Mao, Wenfeng Xie, Dangyang Chen

    Abstract: The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  6. arXiv:2406.01988  [pdf, other

    cs.CL cs.AI

    Personalized Topic Selection Model for Topic-Grounded Dialogue

    Authors: Shixuan Fan, Wei Wei, Xiaofei Wen, Xianling Mao, Jixiong Chen, Dangyang Chen

    Abstract: Recently, the topic-grounded dialogue (TGD) system has become increasingly popular as its powerful capability to actively guide users to accomplish specific tasks through topic-guided conversations. Most existing works utilize side information (\eg topics or personas) in isolation to enhance the topic selection ability. However, due to disregarding the noise within these auxiliary information sour… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  7. arXiv:2406.01392  [pdf, other

    cs.CL

    Sparsity-Accelerated Training for Large Language Models

    Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

    Abstract: Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  8. arXiv:2406.00491  [pdf, other

    cs.NI

    Optimizing Age of Information in Random Access Networks: A Second-Order Approach for Active/Passive Users

    Authors: Siqi Fan, Yuxin Zhong, I-Hong Hou, Clement K Kam

    Abstract: In this paper, we study the moments of the Age of Information (AoI) for both active and passive users in a random access network. In this network, active users broadcast sensing data, while passive users detect in-band radio activities from out-of-network devices, such as jammers. Collisions occur when multiple active users transmit simultaneously. Passive users can detect radio activities only wh… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transaction on Communications. arXiv admin note: text overlap with arXiv:2305.05137

  9. arXiv:2405.19765  [pdf, other

    cs.CV cs.AI

    Towards Unified Multi-granularity Text Detection with Interactive Attention

    Authors: Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  10. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  11. arXiv:2405.19642  [pdf

    cs.AI

    Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry

    Authors: Mengjie Gan, Penglong Lian, Zhiheng Su, Jiyang Zhang, Jialong Huang, Benhao Wang, Jianxiao Zou, Shicai Fan

    Abstract: Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure mode… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6 pages, 2 figures, 2 tables, 63rd IEEE Conference on Decision and Control

  12. arXiv:2405.19454  [pdf, other

    cs.LG stat.ML

    Deep Grokking: Would Deep Neural Networks Generalize Better?

    Authors: Simin Fan, Razvan Pascanu, Martin Jaggi

    Abstract: Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on s… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.05288  [pdf, other

    cs.SI cs.IR cs.LG

    Learning Social Graph for Inactive User Recommendation

    Authors: Nian Liu, Shen Fan, Ting Bai, Peng Wang, Mingwei Sun, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Chuan Shi

    Abstract: Social relations have been widely incorporated into recommender systems to alleviate data sparsity problem. However, raw social relations don't always benefit recommendation due to their inferior quality and insufficient quantity, especially for inactive users, whose interacted items are limited. In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{… ▽ More

    Submitted 22 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: This paper has been received by DASFAA 2024

  14. arXiv:2405.03121  [pdf, other

    cs.CV cs.AI

    AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

    Authors: Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu

    Abstract: The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  15. arXiv:2404.17900  [pdf, other

    cs.CV

    Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

    Authors: Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

    Abstract: Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image re… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  16. arXiv:2404.12130  [pdf, other

    cs.LG cs.CV cs.DC

    One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng

    Abstract: Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL se… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  17. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  18. arXiv:2404.02438  [pdf, other

    cs.CL cs.LG stat.ML

    From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives

    Authors: Shuxian Fan, Adam Visokay, Kentaro Hoffman, Stephen Salerno, Li Liu, Jeffrey T. Leek, Tyler H. McCormick

    Abstract: In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 7 figures

  19. arXiv:2404.00717  [pdf, other

    cs.RO cs.CV cs.MA

    End-to-End Autonomous Driving through V2X Cooperation

    Authors: Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, ** Luo, Zaiqing Nie

    Abstract: Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pio… ▽ More

    Submitted 19 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  20. arXiv:2403.19501  [pdf, other

    cs.CV

    RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

    Authors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

    Abstract: Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR2024, Project website: http://www.lidarhumanmotion.net/reli11d/

  21. arXiv:2403.19185  [pdf, other

    cs.IT eess.SP

    Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning

    Authors: Suhang Fan, Wei Xu, Renjie Xie, Shi **, Derrick Wing Kwan Ng, Naofal Al-Dhahir

    Abstract: Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  22. arXiv:2403.18349  [pdf, other

    cs.CL

    Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

    Authors: Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu

    Abstract: Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduc… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  23. arXiv:2403.10188  [pdf, other

    cs.CR cs.AR

    Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption

    Authors: Shengyu Fan, Xianglong Deng, Zhuoyu Tian, Zhicheng Hu, Liang Chang, Rui Hou, Dan Meng, Mingzhe Zhang

    Abstract: Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators fac… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 14 pages, 15 figures

  24. arXiv:2403.10145  [pdf, other

    cs.CV cs.RO

    RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

    Authors: Ruiyang Hao, Siqi Fan, Yingru Dai, Zhenlin Zhang, Chenxi Li, Yuntian Wang, Haibao Yu, Wenxian Yang, Jirui Yuan, Zaiqing Nie

    Abstract: The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and bl… ▽ More

    Submitted 31 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024. 10 pages with 6 figures

    ACM Class: I.4.8; I.5.4

  25. arXiv:2403.02181  [pdf, other

    cs.CL cs.AI cs.LG

    Not all Layers of LLMs are Necessary during Inference

    Authors: Siqi Fan, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang, Zhongyuan Wang

    Abstract: The inference phase of Large Language Models (LLMs) is very expensive. An ideal inference stage of LLMs could utilize fewer computational resources while still maintaining its capabilities (e.g., generalization and in-context learning ability). In this paper, we try to answer the question, "During LLM inference, can we use shallow layers for easy instances; and deep layers for hard ones?" To answe… ▽ More

    Submitted 14 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  26. arXiv:2402.15272  [pdf, other

    cs.CV cs.AI

    EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

    Authors: Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, **g**g Liu, Yilun Chen, Ya-Qin Zhang

    Abstract: In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, cause… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 7 pages, 8 figures. Accepted by ICRA 2024. arXiv admin note: text overlap with arXiv:arXiv:2303.10975

  27. arXiv:2402.07197  [pdf, other

    cs.AI

    GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

    Authors: Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi

    Abstract: Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several… ▽ More

    Submitted 27 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  28. arXiv:2402.05728  [pdf, other

    cs.CV

    CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes

    Authors: Yi-Ting Pan, Chai-Rong Lee, Shu-Ho Fan, Jheng-Wei Su, Jia-Bin Huang, Yung-Yu Chuang, Hung-Kuo Chu

    Abstract: The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guide… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  29. arXiv:2401.16784  [pdf, other

    cs.LG cs.AI cs.SI

    Graph Fairness Learning under Distribution Shifts

    Authors: Yibo Li, Xiao Wang, Yujie Xing, Shaohua Fan, Ruijia Wang, Yaoqi Liu, Chuan Shi

    Abstract: Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by WWW 2024

  30. arXiv:2401.14818  [pdf, other

    cs.CL cs.DL

    ChemDFM: Dialogue Foundation Model for Chemistry

    Authors: Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu

    Abstract: Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly i… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 10 pages, 12 figures, 13 tables. Under Review

  31. arXiv:2401.12564  [pdf, other

    cs.LG cs.SI

    Graph Contrastive Invariant Learning from the Causal Perspective

    Authors: Yanhu Mo, Xiao Wang, Shaohua Fan, Chuan Shi

    Abstract: Graph contrastive learning (GCL), learning the node representation by contrasting two augmented graphs in a self-supervised way, has attracted considerable attention. GCL is usually believed to learn the invariant representation. However, does this understanding always hold in practice? In this paper, we first study GCL from the perspective of causality. By analyzing GCL with the structural causal… ▽ More

    Submitted 7 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  32. arXiv:2401.10244  [pdf

    cs.IR cs.CL

    Knowledge Graph Driven Recommendation System Algorithm

    Authors: Chaoyang Zhang, Yanan Li, Shen Chen, Siwei Fan, Wei Li

    Abstract: In this paper, we propose a novel graph neural network-based recommendation model called KGLN, which leverages Knowledge Graph (KG) information to enhance the accuracy and effectiveness of personalized recommendations. We first use a single-layer neural network to merge individual node features in the graph, and then adjust the aggregation weights of neighboring entities by incorporating influence… ▽ More

    Submitted 3 February, 2024; v1 submitted 1 December, 2023; originally announced January 2024.

  33. arXiv:2401.08926  [pdf, other

    cs.CV eess.IV

    Uncertainty-aware No-Reference Point Cloud Quality Assessment

    Authors: Songlin Fan, Zixuan Guo, Wei Gao, Ge Li

    Abstract: The evolution of compression and enhancement algorithms necessitates an accurate quality assessment for point clouds. Previous works consistently regard point cloud quality assessment (PCQA) as a MOS regression problem and devise a deterministic map**, ignoring the stochasticity in generating MOS from subjective tests. Besides, the viewpoint switching of 3D point clouds in subjective tests reinf… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  34. arXiv:2401.03494  [pdf

    cs.LG cs.CE physics.app-ph

    Pre-insertion resistors temperature prediction based on improved WOA-SVR

    Authors: Honghe Dai, Site Mo, Haoxin Wang, Nan Yin, Songhai Fan, Bixiong Li

    Abstract: The pre-insertion resistors (PIR) within high-voltage circuit breakers are critical components and warm up by generating Joule heat when an electric current flows through them. Elevated temperature can lead to temporary closure failure and, in severe cases, the rupture of PIR. To accurately predict the temperature of PIR, this study combines finite element simulation techniques with Support Vector… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  35. arXiv:2401.01275  [pdf, other

    cs.CL

    CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

    Authors: Quan Tu, Shilong Fan, Zihang Tian, Rui Yan

    Abstract: Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA… ▽ More

    Submitted 9 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  36. arXiv:2312.06220  [pdf, other

    cs.LG cs.AI

    Dance of Channel and Sequence: An Efficient Attention-Based Approach for Multivariate Time Series Forecasting

    Authors: Haoxin Wang, Yipeng Mo, Nan Yin, Honghe Dai, Bixiong Li, Songhai Fan, Site Mo

    Abstract: In recent developments, predictive models for multivariate time series analysis have exhibited commendable performance through the adoption of the prevalent principle of channel independence. Nevertheless, it is imperative to acknowledge the intricate interplay among channels, which fundamentally influences the outcomes of multivariate predictions. Consequently, the notion of channel independence,… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  37. arXiv:2312.05921  [pdf, other

    cs.AI

    Dig-CSI: A Distributed and Generative Model Assisted CSI Feedback Training Framework

    Authors: Zhilin Du, Haozhen Li, Zhenyu Liu, Shilong Fan, Xinyu Gu, Lin Zhang

    Abstract: The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  38. arXiv:2312.03775  [pdf, other

    cs.CV

    FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

    Authors: Linze Li, Sunqi Fan, Hengjun Pu, Zhaodong Bing, Yao Tang, Tianzhu Ye, Tong Yang, Liangyu Chen, Jiajun Liang

    Abstract: Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and edit… ▽ More

    Submitted 20 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  39. arXiv:2312.01691  [pdf, other

    astro-ph.SR cs.LG physics.space-ph

    Estimating Coronal Mass Ejection Mass and Kinetic Energy by Fusion of Multiple Deep-learning Models

    Authors: Khalid A. Alobaid, Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Shen Fan, Jialiang Li, Huseyin Cavus, Vasyl Yurchyshyn

    Abstract: Coronal mass ejections (CMEs) are massive solar eruptions, which have a significant impact on Earth. In this paper, we propose a new method, called DeepCME, to estimate two properties of CMEs, namely, CME mass and kinetic energy. Being able to estimate these properties helps better understand CME dynamics. Our study is based on the CME catalog maintained at the Coordinated Data Analysis Workshops… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 10 pages, 7 figures

    Journal ref: The Astrophysical Journal Letters, 958:L34, 2023

  40. arXiv:2312.00452  [pdf, other

    cs.CV

    Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence

    Authors: Yajie Liu, Pu Ge, Haoxiang Ma, Shichao Fan, Qingjie Liu, Di Huang, Yunhong Wang

    Abstract: Referring image segmentation (RIS) aims to segment objects in an image conditioning on free-from text descriptions. Despite the overwhelming progress, it still remains challenging for current approaches to perform well on cases with various text expressions or with unseen visual entities, limiting its further application. In this paper, we present a novel RIS approach, which substantially improves… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 7 pages, 4 figures

  41. arXiv:2311.16079  [pdf, other

    cs.CL cs.AI cs.LG

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

    Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  42. arXiv:2311.13333  [pdf, other

    cs.OS

    Trace-enabled Timing Model Synthesis for ROS2-based Autonomous Applications

    Authors: Hazem Abaza, Debayan Roy, Shiqing Fan, Selma Saidi, Antonios Motakis

    Abstract: Autonomous applications are typically developed over Robot Operating System 2.0 (ROS2) even in time-critical systems like automotive. Recent years have seen increased interest in develo** model-based timing analysis and schedule optimization approaches for ROS2-based applications. To complement these approaches, we propose a tracing and measurement framework to obtain timing models of ROS2-based… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

  43. arXiv:2311.11285  [pdf, other

    cs.LG

    TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss

    Authors: Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu

    Abstract: Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  44. arXiv:2311.01811  [pdf, other

    cs.CV cs.AI

    DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

    Authors: Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu

    Abstract: Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we pr… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 5 pages, Accepted to ICASSP 2024

  45. arXiv:2311.00371  [pdf, other

    cs.CV

    Learning Cooperative Trajectory Representations for Motion Forecasting

    Authors: Hongzhi Ruan, Haibao Yu, Wenxian Yang, Siqi Fan, Yingjuan Tang, Zaiqing Nie

    Abstract: Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction infor… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  46. arXiv:2310.15393  [pdf, other

    cs.LG cs.AI cs.CL

    DoGE: Domain Reweighting with Generalization Estimation

    Authors: Simin Fan, Matteo Pagliardini, Martin Jaggi

    Abstract: The coverage and composition of the pretraining data significantly impacts the generalization ability of Large Language Models (LLMs). Despite its importance, recent LLMs still rely on heuristics and trial and error to increase or reduce the influence of data-domains. We propose DOmain reweighting with Generalization Estimation (DoGE), which optimizes the probability of sampling from each domain (… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  47. arXiv:2310.15389  [pdf, other

    cs.CL cs.AI cs.LG

    Irreducible Curriculum for Language Model Pretraining

    Authors: Simin Fan, Martin Jaggi

    Abstract: Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large langu… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  48. arXiv:2310.04992  [pdf, other

    eess.IV cs.CV

    VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  49. arXiv:2309.15529  [pdf

    eess.IV cs.CV cs.LG

    Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

    Authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen

    Abstract: Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  50. arXiv:2309.03852  [pdf, other

    cs.CL cs.AI

    FLM-101B: An Open LLM and How to Train It with $100K Budget

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, **g Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in develo** LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.… ▽ More

    Submitted 17 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.