Skip to main content

Showing 51–100 of 351 results for author: Bai, L

.
  1. arXiv:2401.16669  [pdf

    cs.LG cs.AI physics.ao-ph physics.geo-ph

    Improving Global Weather and Ocean Wave Forecast with Large Artificial Intelligence Models

    Authors: Fenghua Ling, Lin Ouyang, Boufeniza Redouane Larbi, **g-Jia Luo, Tao Han, Xiaohui Zhong, Lei Bai

    Abstract: The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  2. arXiv:2401.16416  [pdf, other

    cs.CV

    Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

    Authors: Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previo… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  3. arXiv:2401.12681  [pdf, other

    cs.LG cs.AI

    Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning

    Authors: Zhishuai Li, Yunhao Nie, Ziyue Li, Lei Bai, Yisheng Lv, Rui Zhao

    Abstract: Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted in AISTATS 2024

  4. arXiv:2401.12505  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.other

    Topological magnons in a non-coplanar magnetic order on the triangular lattice

    Authors: Linli Bai, Ken Chen

    Abstract: The bond-dependent Kitaev interaction $K$ is familiar in the effective spin model of transition metal compounds with octahedral ligands. In this work, we find a peculiar non-coplanar magnetic order can be formed with the help of $K$ and next-nearest neighbor Heisenberg coupling $J_2$ on the triangular lattice. It can be seen as a miniature version of skyrmion crystal, since it has nine spins and a… ▽ More

    Submitted 27 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  5. arXiv:2401.11960  [pdf, other

    cs.CV eess.IV

    Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  6. arXiv:2401.09274  [pdf, ps, other

    math.OC cs.LG

    Avoiding strict saddle points of nonconvex regularized problems

    Authors: Luwei Bai

    Abstract: We introduce a strict saddle property for $\ell_p$ regularized functions, and propose an iterative reweighted $\ell_1$ algorithm to solve the $\ell_p$ regularized problems. The algorithm is guaranteed to converge only to local minimizers when randomly initialized. The strict saddle property is shown generic on these sparse optimization problems. Those analyses as well as the proposed algorithm can… ▽ More

    Submitted 9 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 22 pages

  7. arXiv:2401.06013  [pdf, other

    cs.CV cs.AI

    Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoR… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by IPCAI 2024 (IJCAR Special Issue)

  8. arXiv:2401.04148  [pdf, other

    cs.LG cs.AI eess.SP

    Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

    Authors: Pengxin Guo, Pengrong **, Ziyue Li, Lei Bai, Yu Zhang

    Abstract: Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the traine… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  9. arXiv:2401.01759  [pdf, other

    cs.SI cs.CL cs.CV cs.MM

    VGA: Vision and Graph Fused Attention Network for Rumor Detection

    Authors: Lin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui

    Abstract: With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  10. arXiv:2401.01117  [pdf, other

    cs.CV eess.IV

    Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

    Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures

  11. arXiv:2401.00496  [pdf, other

    cs.CV cs.AI cs.LG

    SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

    Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

    Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More

    Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  12. arXiv:2312.12462  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Towards an end-to-end artificial intelligence driven global weather forecasting system

    Authors: Kun Chen, Lei Bai, Fenghua Ling, Peng Ye, Tao Chen, **g-Jia Luo, Hao Chen, Yi Xiao, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: The weather forecasting system is important for science and society, and significant achievements have been made in applying artificial intelligence (AI) to medium-range weather forecasting. However, existing AI-based weather forecasting models rely on analysis or reanalysis products from traditional numerical weather prediction (NWP) systems as initial conditions for making predictions. Initial s… ▽ More

    Submitted 8 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  13. arXiv:2312.12455  [pdf, other

    physics.ao-ph cs.AI cs.LG

    FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation

    Authors: Yi Xiao, Lei Bai, Wei Xue, Kang Chen, Tao Han, Wanli Ouyang

    Abstract: Weather forecasting is a crucial yet highly challenging task. With the maturity of Artificial Intelligence (AI), the emergence of data-driven weather forecasting models has opened up a new paradigm for the development of weather forecasting systems. Despite the significant successes that have been achieved (e.g., surpassing advanced traditional physical models for global medium-range forecasting),… ▽ More

    Submitted 19 May, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 15 pages, 8 figures

  14. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  15. arXiv:2312.10429  [pdf, other

    physics.geo-ph cs.AI

    ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks

    Authors: Pumeng Lyu, Tao Tang, Fenghua Ling, **g-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Recent studies have shown that deep learning (DL) models can skillfully predict the El Niño-Southern Oscillation (ENSO) forecasts over 1.5 years ahead. However, concerns regarding the reliability of predictions made by DL methods persist, including potential overfitting issues and lack of interpretability. Here, we propose ResoNet, a DL model that combines convolutional neural network (CNN) and Tr… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 32 pages, 5 main figures and 12 supplementary figures

  16. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  17. arXiv:2312.06428  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    VisionTraj: A Noise-Robust Trajectory Recovery Framework based on Large-scale Camera Network

    Authors: Zhishuai Li, Ziyue Li, Xiaoru Hu, Guoqing Du, Yunhao Nie, Feng Zhu, Lei Bai, Rui Zhao

    Abstract: Trajectory recovery based on the snapshots from the city-wide multi-camera network facilitates urban mobility sensing and driveway optimization. The state-of-the-art solutions devoted to such a vision-based scheme typically incorporate predefined rules or unsupervised iterative feedback, struggling with multi-fold challenges such as lack of open-source datasets for training the whole pipeline, and… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  18. arXiv:2312.01697  [pdf, other

    cs.CV cs.AI

    Hulk: A Universal Knowledge Translator for Human-Centric Tasks

    Authors: Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

    Abstract: Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did no… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 24 pages, 5 figures

  19. arXiv:2311.02962  [pdf, other

    cs.AI cs.CL cs.IR

    Retrieval-Augmented Code Generation for Universal Information Extraction

    Authors: Yucan Guo, Zixuan Li, Xiaolong **, Yantao Liu, Yutao Zeng, Wenxuan Liu, Xiang Li, Pan Yang, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Langua… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  20. arXiv:2311.02631  [pdf, other

    cs.LG cs.AI

    A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

    Authors: Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao

    Abstract: The trajectory on the road traffic is commonly collected at a low sampling rate, and trajectory recovery aims to recover a complete and continuous trajectory from the sparse and discrete inputs. Recently, sequential language models have been innovatively adopted for trajectory recovery in a pre-trained manner: it learns road segment representation vectors, which will be used in the downstream task… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Accepted in ACM SIGSPATIAL 2023

  21. arXiv:2311.00291  [pdf, other

    cs.CV

    Graph Representation Learning for Infrared and Visible Image Fusion

    Authors: **g Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Edwin R. Hancock

    Abstract: Infrared and visible image fusion aims to extract complementary features to synthesize a single fused image. Many methods employ convolutional neural networks (CNNs) to extract local features due to its translation invariance and locality. However, CNNs fail to consider the image's non-local self-similarity (NLss), though it can expand the receptive field by pooling operations, it still inevitably… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  22. arXiv:2310.14174  [pdf, other

    cs.CL

    An In-Context Schema Understanding Method for Knowledge Base Question Answering

    Authors: Yantao Liu, Zixuan Li, Xiaolong **, Yucan Guo, Long Bai, Sai** Guan, Jiafeng Guo, Xueqi Cheng

    Abstract: The Knowledge Base Question Answering (KBQA) task aims to answer natural language questions based on a given knowledge base. Recently, Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task. In doing so, a major challenge for LLMs is to overcome the immensity and heterogeneity of knowledge base schemas.Existing methods bypass this c… ▽ More

    Submitted 10 February, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

  23. arXiv:2310.13447  [pdf, other

    cs.CV cs.AI cs.CL

    Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

    Authors: Siyu Zhang, Yeming Chen, Sirui Cheng, Yaoru Sun, Jun Yang, Lizhi Bai

    Abstract: Within the multimodal field, the key to integrating vision and language lies in establishing a good alignment strategy. Recently, benefiting from the success of self-supervised learning, significant progress has been made in multimodal semantic representation based on pre-trained models for vision and language. However, there is still room for improvement in visual semantic representation. The lac… ▽ More

    Submitted 25 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  24. arXiv:2310.09937  [pdf, other

    eess.IV eess.SP

    Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

    Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

    Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: To appear in IEEE Sensors Journal

  25. arXiv:2310.08261  [pdf, other

    cs.CV

    GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection

    Authors: Ziying Song, Haiyue Wei, Lin Bai, Lei Yang, Caiyan Jia

    Abstract: LiDAR and cameras are complementary sensors for 3D object detection in autonomous driving. However, it is challenging to explore the unnatural interaction between point clouds and images, and the critical factor is how to conduct feature alignment of heterogeneous modalities. Currently, many methods achieve feature alignment by projection calibration only, without considering the problem of coordi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  26. arXiv:2310.01994  [pdf, other

    cs.CV

    Understanding Masked Autoencoders From a Local Contrastive Perspective

    Authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Lu** Zhou, Wanli Ouyang

    Abstract: Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we fir… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  27. arXiv:2309.15718  [pdf, other

    physics.chem-ph physics.comp-ph

    Geometry-enhanced Pre-training on Interatomic Potentials

    Authors: Taoyong Cui, Chenyu Tang, Mao Su, Shufei Zhang, Yuqiang Li, Lei Bai, Yuhan Dong, Xingao Gong, Wanli Ouyang

    Abstract: Machine learning interatomic potentials (MLIPs) enables molecular dynamics (MD) simulations with ab initio accuracy and has been applied to various fields of physical science. However, the performance and transferability of MLIPs are limited by insufficient labeled training data, which require expensive ab initio calculations to obtain the labels, especially for complex molecular systems. To addre… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: Published in Nature Machine Intelligence 2024

  28. arXiv:2309.12960  [pdf, other

    cs.CL

    Nested Event Extraction upon Pivot Element Recogniton

    Authors: Weicheng Ren, Zixuan Li, Xiaolong **, Long Bai, Miao Su, Yantao Liu, Sai** Guan, Jiafeng Guo, Xueqi Cheng

    Abstract: Nested Event Extraction (NEE) aims to extract complex event structures where an event contains other events as its arguments recursively. Nested events involve a kind of Pivot Elements (PEs) that simultaneously act as arguments of outer-nest events and as triggers of inner-nest events, and thus connect them into nested structures. This special characteristic of PEs brings challenges to existing NE… ▽ More

    Submitted 7 April, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: Accepted at LREC-COLING 2024

  29. arXiv:2309.12892  [pdf, other

    cs.CL cs.AI

    ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction

    Authors: Zhilei Hu, Zixuan Li, Daozhu Xu, Long Bai, Cheng **, Xiaolong **, Jiafeng Guo, Xueqi Cheng

    Abstract: Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation an… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Work in progress

  30. arXiv:2309.10431  [pdf, other

    cs.CV

    Sample-adaptive Augmentation for Point Cloud Recognition Against Real-world Corruptions

    Authors: Jie Wang, Lihe Ding, Tingfa Xu, Shaocong Dong, Xinli Xu, Long Bai, Jianan Li

    Abstract: Robust 3D perception under corruption has become an essential task for the realm of 3D vision. While current data augmentation techniques usually perform random transformations on all point cloud objects in an offline way and ignore the structure of the samples, resulting in over-or-under enhancement. In this work, we propose an alternative to make sample-adaptive transformations based on the stru… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV2023; code: https://github.com/Roywangj/AdaptPoint

  31. arXiv:2309.10242  [pdf, other

    math.OC

    Reinforcement Learning for optimal dividend problem under diffusion model

    Authors: Lihua Bai, Thejani Gamage, ** Ma, Pengxu Xie

    Abstract: In this paper, we study the optimal dividend problem under the continuous time diffusion model with the dividend rate being restricted in a given finite interval. Unlike the standard literature, we shall particularly be interested in the case when the parameters (e.g. drift and diffusion coefficients) of the model are not specified so that the optimal control cannot be explicitly determined. We th… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  32. Transport properties of a holographic model with novel gauge-axion coupling

    Authors: Lin-Yue Bai, Jian-Pin Wu, Zhen-Hua Zhou

    Abstract: We investigate the transport properties within a holographic model characterized by a novel gauge-axion coupling. A key innovation is the introduction of the direct coupling between axion fields, the antisymmetric tensor, and the gauge field in our bulk theory. This novel coupling term leads to the emergence of nondiagonal components in the conductivity tensor. An important characteristic is that… ▽ More

    Submitted 26 December, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 29 pages, 11 figures, Published version

    Journal ref: Phys.Rev.D 108 (2023) 12, 126015

  33. arXiv:2309.03467  [pdf, other

    cs.CV cs.AI

    Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation

    Authors: Zhuqiang Lu, Kun Hu, Chaoyue Wang, Lei Bai, Zhiyong Wang

    Abstract: A 360-degree (omni-directional) image provides an all-encompassing spherical view of a scene. Recently, there has been an increasing interest in synthesising 360-degree images from conventional narrow field of view (NFoV) images captured by digital cameras and smartphones, for providing immersive experiences in various scenarios such as virtual reality. Yet, existing methods typically fall short i… ▽ More

    Submitted 8 April, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 24

    ACM Class: I.4.0

  34. arXiv:2308.16376  [pdf, other

    eess.IV cs.CV cs.DC

    Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training

    Authors: Lei Bai, Dongang Wang, Michael Barnett, Mariano Cabezas, Weidong Cai, Fernando Calamante, Kain Kyle, Dongnan Liu, Linda Ly, Aria Nguyen, Chun-Chien Shieh, Ryan Sullivan, Hengrui Wang, Geng Zhan, Wanli Ouyang, Chenyu Wang

    Abstract: Accurately measuring the evolution of Multiple Sclerosis (MS) with magnetic resonance imaging (MRI) critically informs understanding of disease progression and helps to direct therapeutic strategy. Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. Obtaining sufficient data from a single clin… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 4 figures, journal submission

  35. arXiv:2308.15717  [pdf

    eess.SY math.OC

    Risk-aware Flexible Resource Utilization in an Unbalanced Three-Phase Distribution Network using SDP-based Distributionally Robust Optimal Power Flow

    Authors: Zelong Lu, Jianxue Wang, Mohammad Shahidehpour, Linquan Bai, Zuyi Li, Lei Yan, Xianlong Chen

    Abstract: The variability caused by the proliferation of distributed energy resources (DERs) and the significant growth in unbalanced three-phase loads pose unprecedented challenges to distribution network operations. This paper focuses on how a distribution system operator (DSO), taking over the distribution grid and market operations, would develop a risk-aware flexibility market to mitigate uncertainties… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  36. arXiv:2308.14100  [pdf, other

    cs.CV cs.AI

    Rethinking Exemplars for Continual Semantic Segmentation in Endoscopy Scenes: Entropy-based Mini-Batch Pseudo-Replay

    Authors: Guankun Wang, Long Bai, Yanan Wu, Tong Chen, Hongliang Ren

    Abstract: Endoscopy is a widely used technique for the early detection of diseases or robotic-assisted minimally invasive surgery (RMIS). Numerous deep learning (DL)-based research works have been developed for automated diagnosis or processing of endoscopic view. However, existing DL models may suffer from catastrophic forgetting. When new target classes are introduced over time or cross institutions, the… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted by Computers in Biology and Medicine

  37. arXiv:2308.10468  [pdf, other

    cs.CV

    STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning

    Authors: Tao Han, Lei Bai, Lingbo Liu, Wanli Ouyang

    Abstract: Scale variation is a deep-rooted problem in object counting, which has not been effectively addressed by existing scale-aware algorithms. An important factor is that they typically involve cooperative learning across multi-resolutions, which could be suboptimal for learning the most discriminative features from each scale. In this paper, we propose a novel method termed STEERER (\textbf{S}elec\tex… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023, 9 pages

  38. arXiv:2308.07203  [pdf, other

    cs.IT

    Successive Refinement of Shannon Cipher System Under Maximal Leakage

    Authors: Zhuangfei Wu, Lin Bai, Lin Zhou

    Abstract: We study the successive refinement setting of Shannon cipher system (SCS) under the maximal leakage secrecy metric for discrete memoryless sources under bounded distortion measures. Specifically, we generalize the threat model for the point-to-point rate-distortion setting of Issa, Wagner and Kamath (T-IT 2020) to the multiterminal successive refinement setting. Under mild conditions that correspo… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  39. arXiv:2308.02869  [pdf, other

    cs.CV cs.AI

    Semi-supervised Learning for Segmentation of Bleeding Regions in Video Capsule Endoscopy

    Authors: Hechen Li, Yanan Wu, Long Bai, An Wang, Tong Chen, Hongliang Ren

    Abstract: In the realm of modern diagnostic technology, video capsule endoscopy (VCE) is a standout for its high efficacy and non-invasive nature in diagnosing various gastrointestinal (GI) conditions, including obscure bleeding. Importantly, for the successful diagnosis and treatment of these conditions, accurate recognition of bleeding regions in VCE images is crucial. While deep learning-based methods ha… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: ICBIR 2023

  40. arXiv:2308.02845  [pdf, other

    eess.IV cs.CV cs.RO

    Landmark Detection using Transformer Toward Robot-assisted Nasal Airway Intubation

    Authors: Tianhang Liu, Hechen Li, Long Bai, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

    Abstract: Robot-assisted airway intubation application needs high accuracy in locating targets and organs. Two vital landmarks, nostrils and glottis, can be detected during the intubation to accommodate the stages of nasal intubation. Automated landmark detection can provide accurate localization and quantitative evaluation. The Detection Transformer (DeTR) leads object detectors to a new paradigm with long… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: ICBIR 2023 (Best Student Paper Award). Code availability: https://github.com/ConorLTH/airway_intubation_landmarks_detection

  41. arXiv:2308.00588  [pdf, other

    cs.CV cs.MM

    Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

    Authors: Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu, Rui Zhao

    Abstract: Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific u… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted in IEEE Transactions on Multimedia

  42. arXiv:2307.12045  [pdf, other

    cs.CV cs.CL cs.RO

    Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery

    Authors: Long Bai, Mobarakol Islam, Hongliang Ren

    Abstract: The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education. Except for providing text-based answers, the VQLA system can highlight the interested region for better surgical scene understanding. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge. Specifically, when DNNs learn on incremental cla… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CS-VQLA

  43. arXiv:2307.09211  [pdf, other

    cond-mat.mtrl-sci

    Intrinsic ferroelectric switching in two-dimension $α$-In$_2$Se$_3$

    Authors: Liyi Bai, Changming Ke, Zhongshen Luo, Tianyuan Zhu, Lu You, Shi Liu

    Abstract: Two-dimensional (2D) ferroelectric semiconductors present opportunities for integrating ferroelectrics into high-density ultrathin nanoelectronics. Among the few synthesized 2D ferroelectrics, $α$-In$_2$Se$_3$, known for its electrically addressable vertical polarization has attracted significant interest. However, the understanding of many fundamental characteristics of this material, such as the… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: 30 pages, 6 figures

  44. arXiv:2307.05182  [pdf, other

    cs.CV cs.AI cs.RO

    CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

    Authors: Long Bai, Mobarakol Islam, Hongliang Ren

    Abstract: Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. I… ▽ More

    Submitted 19 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CAT-ViL

  45. arXiv:2307.02452  [pdf, other

    eess.IV cs.CV cs.RO

    LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

    Authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren

    Abstract: Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gr… ▽ More

    Submitted 22 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/LLCaps

  46. arXiv:2306.14177  [pdf, other

    cs.CV

    Enhancing Mapless Trajectory Prediction through Knowledge Distillation

    Authors: Yuning Wang, Pu Zhang, Lei Bai, Jianru Xue

    Abstract: Scene information plays a crucial role in trajectory forecasting systems for autonomous driving by providing semantic clues and constraints on potential future paths of traffic agents. Prevalent trajectory prediction techniques often take high-definition maps (HD maps) as part of the inputs to provide scene knowledge. Although HD maps offer accurate road information, they may suffer from the high… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: submitted to NeurIPS 2023

  47. arXiv:2306.14143  [pdf, other

    eess.SP

    Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines

    Authors: Xiang Cheng, Haotian Zhang, Jianan Zhang, Shijian Gao, Sijiang Li, Ziwei Huang, Lu Bai, Zonghui Yang, Xinhu Zheng, Liuqing Yang

    Abstract: In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrade the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency (RF) sensors being the primary participants, thus lacking a comprehensive environment feat… ▽ More

    Submitted 20 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by IEEE Communications Surveys & Tutorials

  48. arXiv:2306.14125  [pdf, other

    eess.SP

    M$^3$SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration

    Authors: Xiang Cheng, Ziwei Huang, Lu Bai, Haotian Zhang, Mingran Sun, Boxun Liu, Sijiang Li, Jianan Zhang, Minson Lee

    Abstract: The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC data… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 12 pages, 12 figures

  49. arXiv:2306.10900  [pdf, other

    cs.CV cs.AI

    MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

    Authors: Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

    Abstract: Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human i… ▽ More

    Submitted 18 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 18 pages, 8 figures, accepted by AAAI 2024

  50. arXiv:2306.08259  [pdf, other

    cs.LG

    LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

    Authors: Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, Roger Zimmermann

    Abstract: Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not… ▽ More

    Submitted 28 October, 2023; v1 submitted 14 June, 2023; originally announced June 2023.