Skip to main content

Showing 1–50 of 581 results for author: Luo, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18583  [pdf, other

    cs.CV cs.LG

    Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

    Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  2. arXiv:2406.17575  [pdf, other

    cs.CV

    Toward Universal Medical Image Registration via Sharpness-Aware Meta-Continual Learning

    Authors: Bomin Wang, Xinzhe Luo, Xiahai Zhuang

    Abstract: Current deep learning approaches in medical image registration usually face the challenges of distribution shift and data collection, hindering real-world deployment. In contrast, universal medical image registration aims to perform registration on a wide range of clinically relevant tasks simultaneously, thus having tremendous potential for clinical applications. In this paper, we present the fir… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  3. arXiv:2406.17404  [pdf, other

    cs.CL cs.LG

    Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

    Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 6 figures

  4. arXiv:2406.15534  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

    Authors: Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao

    Abstract: The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

  5. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.13674  [pdf, other

    eess.IV cs.CV

    Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases

    Authors: Xiangde Luo, Zihan Li, Shaoting Zhang, Wenjun Liao, Guotai Wang

    Abstract: Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans ($\sim$80k 2D images, $\sim$8k 3D organ annot… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure, 6 tables, Early Accept to MICCAI 2024

  7. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  8. arXiv:2406.12404  [pdf

    cs.CV

    Scan-to-BIM for As-built Roads: Automatic Road Digital Twinning from Semantically Labeled Point Cloud Data

    Authors: Yuexiong Ding, Mengtian Yin, Ran Wei, Ioannis Brilakis, Muyang Liu, Xiaowei Luo

    Abstract: Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset type… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.12395  [pdf

    cs.CV

    SDNIA-YOLO: A Robust Object Detection Model for Extreme Weather Conditions

    Authors: Yuexiong Ding, Xiaowei Luo

    Abstract: Though current object detection models based on deep learning have achieved excellent results on many conventional benchmark datasets, their performance will dramatically decline on real-world images taken under extreme conditions. Existing methods either used image augmentation based on traditional image processing algorithms or applied customized and scene-limited image adaptation technologies f… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  10. arXiv:2406.11629  [pdf, other

    cs.CL

    Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!

    Authors: Mingyang Song, Mao Zheng, Xuan Luo

    Abstract: Leveraging Large Language Models (LLMs) as judges for judging the performance of LLMs has recently garnered attention. However, this type of approach is affected by the potential biases in LLMs, raising concerns about the reliability of the evaluation results. To mitigate this issue, we propose and study two versions of many-shot in-context prompts, which rely on two existing settings of many-shot… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: work in progress

  11. arXiv:2406.11208  [pdf

    cs.NI

    Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses

    Authors: Cheng Su, Xiaofeng Luo, Zhenmou Liu, Jiawen Kang, Min Hao, Zehui Xiong, Zhaohui Yang, Chongwen Huang

    Abstract: The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 6pages, 4 figures

  12. arXiv:2406.10776  [pdf, other

    cs.MM

    High-level Codes and Fine-grained Weights for Online Multi-modal Hashing Retrieval

    Authors: Yu-Wei Zhan, Xiao-Ming Wu, Xin Luo, Yinwei Wei, Xin-Shun Xu

    Abstract: In the real world, multi-modal data often appears in a streaming fashion, and there is a growing demand for similarity retrieval from such non-stationary data, especially at a large scale. In response to this need, online multi-modal hashing has gained significant attention. However, existing online multi-modal hashing methods face challenges related to the inconsistency of hash codes during long-… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 32 pages, 4 figures

  13. Nurgle: Exacerbating Resource Consumption in Blockchain State Storage via MPT Manipulation

    Authors: Zheyuan He, Zihao Li, Ao Qiao, Xiapu Luo, Xiaosong Zhang, Ting Chen, Shuwei Song, Dijun Liu, Weina Niu

    Abstract: Blockchains, with intricate architectures, encompass various components, e.g., consensus network, smart contracts, decentralized applications, and auxiliary services. While offering numerous advantages, these components expose various attack surfaces, leading to severe threats to blockchains. In this study, we unveil a novel attack surface, i.e., the state storage, in blockchains. The state storag… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  14. arXiv:2406.09031  [pdf, other

    cs.LG cs.AI

    A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability

    Authors: Pengyun Wang, Junyu Luo, Yanxin Shen, Siyu Heng, Xiao Luo

    Abstract: Graph pooling has gained attention for its ability to obtain effective node and graph representations for various downstream tasks. Despite the recent surge in graph pooling approaches, there is a lack of standardized experimental settings and fair benchmarks to evaluate their performance. To address this issue, we have constructed a comprehensive benchmark that includes 15 graph pooling methods a… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.05645  [pdf, other

    cs.CV cs.AI cs.LG

    Anomaly Multi-classification in Industrial Scenarios: Transferring Few-shot Learning to a New Task

    Authors: Jie Liu, Yao Wu, Xiaotong Luo, Zongze Wu

    Abstract: In industrial scenarios, it is crucial not only to identify anomalous items but also to classify the type of anomaly. However, research on anomaly multi-classification remains largely unexplored. This paper proposes a novel and valuable research task called anomaly multi-classification. Given the challenges in applying few-shot learning to this task, due to limited training data and unique charact… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  16. arXiv:2406.04603  [pdf, ps, other

    cs.CV

    Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

    Authors: Xinquan Yang, Xuguang Li, Xiaoling Luo, Leilei Zeng, Yudi Zhang, Linlin Shen, Yongqiang Deng

    Abstract: Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: MICCAI'2024

  17. arXiv:2406.04342  [pdf, other

    cs.CV

    Learning 1D Causal Visual Representation with De-focus Attention Networks

    Authors: Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai

    Abstract: Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-foc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  18. arXiv:2406.02594  [pdf, other

    cs.LG cs.AI

    Graph Neural Networks for Brain Graph Learning: A Survey

    Authors: Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Amin Beheshti, Quan Z. Sheng, David McAlpine, Paul Sowman, Alexis Giral, Philip S. Yu

    Abstract: Exploring the complex structure of the human brain is crucial for understanding its functionality and diagnosing brain disorders. Thanks to advancements in neuroimaging technology, a novel approach has emerged that involves modeling the human brain as a graph-structured pattern, with different brain regions represented as nodes and the functional relationships among these regions as edges. Moreove… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, IJCAI-2024

    MSC Class: 68T07 (Primary) 68T30 (Secondary)

  19. arXiv:2406.02536  [pdf, other

    cs.CL cs.LG

    Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

    Authors: Yijiong Yu, Huiqiang Jiang, Xufang Luo, Qianhui Wu, Chin-Yew Lin, Dongsheng Li, Yuqing Yang, Yongfeng Huang, Lili Qiu

    Abstract: Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01797  [pdf, other

    cs.CV cs.RO

    The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

    Authors: Paolo Cudrano, Xiaoyu Luo, Matteo Matteucci

    Abstract: As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics litera… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to CoLLAs 2024

  21. arXiv:2406.01281  [pdf

    physics.med-ph cs.HC

    Extraction of Maternal and fetal ECG in a non-invasive way from abdominal ECG recordings using modified Progressive FastICA Peel-off

    Authors: Yao Li, Xuanyu Luo, Haowen Zhao, Jiawen Cui, Yangfan She, Dongfang Li, Lai Jiang, Xu Zhang

    Abstract: The non-invasive abdominal electrocardiogram (AECG) gives a non-invasive way to monitor fetal well-being during pregnancy. Due to the overlap with maternal ECG (MECG) as well as potential noises from other sources, it is challenging to extract weak fetal ECG (FECG) using surface electrodes. Taking advantage of precise source separation capability of the FastICA approach combined with its constrain… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2405.20561  [pdf, other

    cs.CR cs.SE

    All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts

    Authors: Tianle Sun, Ningyu He, Jiang Xiao, Yinliang Yue, Xiapu Luo, Haoyu Wang

    Abstract: In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the voi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by USENIX Security 2024

  23. arXiv:2405.19669  [pdf, other

    cs.CV

    Texture-guided Coding for Deep Features

    Authors: Lei Xiong, Xin Luo, Zihao Wang, Chaofan He, Shuyuan Zhu, Bing Zeng

    Abstract: With the rapid development of machine vision technology in recent years, many researchers have begun to focus on feature compression that is better suited for machine vision tasks. The target of feature compression is deep features, which arise from convolution in the middle layer of a pre-trained convolutional neural network. However, due to the large volume of data and high level of abstraction… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  24. Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations

    Authors: Zongbao Zhang, Jiao Hao, Wenmeng Zhao, Yan Liu, Yaohui Huang, Xinhang Luo

    Abstract: The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 1 figure, AEEES 2024

  25. arXiv:2405.13028  [pdf, other

    cs.CL cs.AI

    DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues

    Authors: Xiang Luo, Zhiwen Tang, ** Wang, Xuejie Zhang

    Abstract: User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by COLING 2024

  26. arXiv:2405.12202  [pdf, other

    cs.CV cs.AI

    Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution

    Authors: Xihaier Luo, Xiaoning Qian, Byung-Jun Yoon

    Abstract: In this work, we present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data, which often involves complex challenges such as continuity, multi-scale physics, and the intricacies of high-frequency signals. Grounded in operator learning, the proposed method is resolution-invariant. The core of our model is a hierarchical neural operator that leverages a Gale… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 20 pages, 14 figures

  27. arXiv:2405.11868  [pdf, other

    cs.LG cs.AI cs.CE cs.IR cs.SI

    Towards Graph Contrastive Learning: A Survey and Beyond

    Authors: Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang

    Abstract: In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  28. arXiv:2405.11496  [pdf, other

    cs.CV cs.IR

    DEMO: A Statistical Perspective for Efficient Image-Text Matching

    Authors: Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo

    Abstract: Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding. Due to the capability to manage large-scale raw data, unsupervised hashing-based approaches have gained prominence recently. They typically construct a semantic similarity structure using the natural distance, which subsequently provides guidance to the model optimizatio… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  29. arXiv:2405.09395  [pdf, other

    q-bio.NC cs.AI cs.CL

    Matching domain experts by training from scratch on domain knowledge

    Authors: Xiaoliang Luo, Guangzhi Sun, Bradley C. Love

    Abstract: Recently, large language models (LLMs) have outperformed human experts in predicting the results of neuroscience experiments (Luo et al., 2024). What is the basis for this performance? One possibility is that statistical patterns in that specific scientific literature, as opposed to emergent reasoning abilities arising from broader training, underlie LLMs' performance. To evaluate this possibility… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  30. arXiv:2405.08272  [pdf, other

    cs.CV

    VS-Assistant: Versatile Surgery Assistant on the Demand of Surgeons

    Authors: Zhen Chen, Xingjian Luo, **lin Wu, Danny T. M. Chan, Zhen Lei, **qiao Wang, Sebastien Ourselin, Hongbin Liu

    Abstract: The surgical intervention is crucial to patient healthcare, and many studies have developed advanced algorithms to provide understanding and decision-making assistance for surgeons. Despite great progress, these algorithms are developed for a single specific task and scenario, and in practice require the manual combination of different functions, thus limiting the applicability. Thus, an intellige… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  31. arXiv:2405.07429  [pdf, other

    cs.RO

    JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

    Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

    Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 8 pages

  32. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, **gwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  33. arXiv:2405.04773  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Hypergraph-enhanced Dual Semi-supervised Graph Classification

    Authors: Wei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Yifan Wang, Xiao Luo, Ming Zhang

    Abstract: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreove… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  34. arXiv:2404.17340  [pdf, other

    cs.CV

    Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

    Authors: Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

    Abstract: Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted at NeurIPS 2023. Email: [email protected]

  35. arXiv:2404.17151  [pdf, other

    cs.MM cs.CV

    MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

    Authors: Chengpei Xu, Wen**g Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

    Abstract: Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a n… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by Transaction on Multimedia

  36. arXiv:2404.16522  [pdf, other

    eess.IV cs.LG

    A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

    Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

    Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  37. arXiv:2404.14852  [pdf, other

    cs.CV

    Ultrasound Nodule Segmentation Using Asymmetric Learning with Simple Clinical Annotation

    Authors: Xingyue Zhao, Zhongyu Li, Xiangde Luo, Peiqi Li, Peng Huang, Jianwei Zhu, Yang Liu, Jihua Zhu, Meng Yang, Shi Chang, Jun Dong

    Abstract: Recent advances in deep learning have greatly facilitated the automated segmentation of ultrasound images, which is essential for nodule morphological analysis. Nevertheless, most existing methods depend on extensive and precise annotations by domain experts, which are labor-intensive and time-consuming. In this study, we suggest using simple aspect ratio annotations directly from ultrasound clini… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by TCSVT

  38. arXiv:2404.13878  [pdf, other

    cs.IR

    Multi-Level Sequence Denoising with Cross-Signal Contrastive Learning for Sequential Recommendation

    Authors: Xiaofei Zhu, Liang Li, Weidong Liu, Xin Luo

    Abstract: Sequential recommender systems (SRSs) aim to suggest next item for a user based on her historical interaction sequences. Recently, many research efforts have been devoted to attenuate the influence of noisy items in sequences by either assigning them with lower attention weights or discarding them directly. The major limitation of these methods is that the former would still prone to overfit noisy… ▽ More

    Submitted 19 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.12876  [pdf, other

    cs.CV cs.AI cs.LG

    A Large-scale Medical Visual Task Adaptation Benchmark

    Authors: Shentong Mo, Xufang Luo, Yansen Wang, Dongsheng Li

    Abstract: Visual task adaptation has been demonstrated to be effective in adapting pre-trained Vision Transformers (ViTs) to general downstream visual tasks using specialized learnable layers or tokens. However, there is yet a large-scale benchmark to fully explore the effect of visual task adaptation on the realistic and important medical domain, particularly across diverse medical visual modalities, such… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  40. arXiv:2404.09146  [pdf, other

    cs.CV cs.AI

    Fusion-Mamba for Cross-modality Object Detection

    Authors: Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang

    Abstract: Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different types of images or merge different backbone features through elaborated neural network modules. However, these methods neglect that modality disparities affect cr… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  41. arXiv:2404.06418  [pdf, other

    cs.LG cs.AI

    Studying the Impact of Latent Representations in Implicit Neural Networks for Scientific Continuous Field Reconstruction

    Authors: Wei Xu, Derek Freeman DeSantis, Xihaier Luo, Avish Parmar, Klaus Tan, Balu Nadiga, Yihui Ren, Shinjae Yoo

    Abstract: Learning a continuous and reliable representation of physical fields from sparse sampling is challenging and it affects diverse scientific disciplines. In a recent work, we present a novel model called MMGN (Multiplicative and Modulated Gabor Network) with implicit neural networks. In this work, we design additional studies leveraging explainability methods to complement the previous experiments a… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  42. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in develo** LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  43. arXiv:2404.03881  [pdf, other

    cs.CL

    A Bi-consolidating Model for Joint Relational Triple Extraction

    Authors: Xiaocheng Luo, Yan** Chen, Ruixue Tang, Ruizhang Huang, Yongbin Qin

    Abstract: Current methods to extract relational triples directly make a prediction based on a possible entity pair in a raw sentence without depending on entity recognition. The task suffers from a serious semantic overlap** problem, in which several relation triples may share one or two entities in a sentence. It is weak to learn discriminative semantic features relevant to a relation triple. In this pap… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  44. arXiv:2404.01617  [pdf, other

    cs.NI cs.LG cs.MM

    LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models

    Authors: Zhiyuan He, Aashish Gottipati, Lili Qiu, Francis Y. Yan, Xufang Luo, Kenuo Xu, Yuqing Yang

    Abstract: We present LLM-ABR, the first system that utilizes the generative capabilities of large language models (LLMs) to autonomously design adaptive bitrate (ABR) algorithms tailored for diverse network characteristics. Operating within a reinforcement learning framework, LLM-ABR empowers LLMs to design key components such as states and neural network architectures. We evaluate LLM-ABR across diverse ne… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  45. arXiv:2404.00998  [pdf, other

    cs.CL cs.AI

    LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation

    Authors: Zilong Wang, Xufang Luo, Xinyang Jiang, Dongsheng Li, Lili Qiu

    Abstract: Evaluating generated radiology reports is crucial for the development of radiology AI, but existing metrics fail to reflect the task's clinical requirements. This study proposes a novel evaluation framework using large language models (LLMs) to compare radiology reports for assessment. We compare the performance of various LLMs and demonstrate that, when using GPT-4, our proposed metric achieves e… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 11 pages, 6 figures

  46. arXiv:2404.00661  [pdf, other

    cs.CV

    DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

    Authors: Chunyang Bi, Xin Luo, Sheng Shen, Mengxi Zhang, Huan**g Yue, **gyu Yang

    Abstract: Diffusion models, known for their powerful generative capabilities, play a crucial role in addressing real-world super-resolution challenges. However, these models often focus on improving local textures while neglecting the impacts of global degradation, which can significantly reduce semantic fidelity and lead to inaccurate reconstructions and suboptimal super-resolution performance. To address… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  47. Fusion Dynamical Systems with Machine Learning in Imitation Learning: A Comprehensive Overview

    Authors: Yingbai Hu, Fares J. Abu-Dakka, Fei Chen, Xiao Luo, Zheng Li, Alois Knoll, Wei** Ding

    Abstract: Imitation Learning (IL), also referred to as Learning from Demonstration (LfD), holds significant promise for capturing expert motor skills through efficient imitation, facilitating adept navigation of complex scenarios. A persistent challenge in IL lies in extending generalization from historical demonstrations, enabling the acquisition of new skills without re-teaching. Dynamical system-based IL… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  48. arXiv:2403.15734  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Space Group Informed Transformer for Crystalline Materials Generation

    Authors: Zhendong Cao, Xiaoshan Luo, Jian Lv, Lei Wang

    Abstract: We introduce CrystalFormer, a transformer-based autoregressive model specifically designed for space group-controlled generation of crystalline materials. The space group symmetry significantly simplifies the crystal space, which is crucial for data and compute efficient generative modeling of crystalline materials. Leveraging the prominent discrete and sequential nature of the Wyckoff positions,… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 17 pages, 8 figures

  49. arXiv:2403.15285  [pdf, other

    cs.NI cs.CR cs.HC cs.LG

    Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse

    Authors: Jiawen Kang, Xiaofeng Luo, Jiangtian Nie, Tianhao Wu, Haibo Zhou, Yonghua Wang, Dusit Niyato, Shiwen Mao, Shengli Xie

    Abstract: Driven by the great advances in metaverse and edge computing technologies, vehicular edge metaverses are expected to disrupt the current paradigm of intelligent transportation systems. As highly computerized avatars of Vehicular Metaverse Users (VMUs), the Vehicle Twins (VTs) deployed in edge servers can provide valuable metaverse services to improve driving safety and on-board satisfaction for th… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  50. arXiv:2403.14950  [pdf, other

    cs.CL cs.LG

    KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

    Authors: Xindi Luo, Zequn Sun, **g Zhao, Zhe Zhao, Wei Hu

    Abstract: Parameter-efficient finetuning (PEFT) is a key technique for adapting large language models (LLMs) to downstream tasks. In this paper, we study leveraging knowledge graph embeddings to improve the effectiveness of PEFT. We propose a knowledgeable adaptation method called KnowLA. It inserts an adaptation layer into an LLM to integrate the embeddings of entities appearing in the input text. The adap… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)