Skip to main content

Showing 1–50 of 211 results for author: Bai, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14399  [pdf, other

    cs.LG cs.CV physics.ao-ph stat.ML

    WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

    Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

    Abstract: Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 26 pages,13 figures

  2. arXiv:2406.14191  [pdf, other

    cs.CL cs.AI cs.LG

    Temporal Knowledge Graph Question Answering: A Survey

    Authors: Miao Su, ZiXuan Li, Zhuo Chen, Long Bai, Xiaolong **, Jiafeng Guo

    Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  3. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  4. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.10508  [pdf, other

    cs.CV

    Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis

    Authors: Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren

    Abstract: Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model fo… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in ICBIR 2024

  6. arXiv:2406.01645  [pdf, other

    cs.LG cs.AI

    FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

    Authors: Kun Chen, Tao Chen, Peng Ye, Hao Chen, Kang Chen, Tao Han, Wanli Ouyang, Lei Bai

    Abstract: Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  8. arXiv:2405.15412  [pdf, other

    physics.ao-ph cs.AI cs.LG

    ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

    Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, **g-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.15151  [pdf, other

    cs.CV cs.GR cs.RO

    NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes

    Authors: Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Weijian Liang

    Abstract: Neural implicit representations have recently demonstrated considerable potential in the field of visual simultaneous localization and map** (SLAM). This is due to their inherent advantages, including low storage overhead and representation continuity. However, these methods necessitate the size of the scene as input, which is impractical for unknown scenes. Consequently, we propose NeB-SLAM, a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  10. arXiv:2405.14742  [pdf, other

    cs.LG cs.AI

    HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning

    Authors: Zhuo Xu, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Edwin R. Hancock

    Abstract: Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compre… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.13796  [pdf, other

    cs.LG cs.AI

    Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

    Authors: Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data map** rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets preven… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  12. arXiv:2405.13711  [pdf, other

    cs.LG cs.AI math.DS physics.ao-ph

    VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation

    Authors: Yi Xiao, Qilong Jia, Wei Xue, Lei Bai

    Abstract: Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data. Variational assimilation methods rely on the maximum likelihood approach to formulate a variational cost, with the optimal state estimate derived by minimizing this cost. Although traditional variational meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  13. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, **lin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  14. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Lu** Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  15. arXiv:2405.10218  [pdf, other

    cs.LG cs.AI

    ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks

    Authors: Zhehan Zhao, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Lixiang Xu, Edwin R. Hancock

    Abstract: Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike t… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  16. arXiv:2405.08672  [pdf, other

    eess.IV cs.CV

    EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

    Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: early accepted by MICCAI 2024

  17. arXiv:2405.03376  [pdf, other

    cs.LG cs.CV

    CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

    Authors: Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai

    Abstract: The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Main text and supplementary, 22 pages, 13 figures

  18. arXiv:2405.00216  [pdf, other

    cs.CL cs.AI cs.LG

    Graphical Reasoning: LLM-based Semi-Open Relation Extraction

    Authors: Yicheng Tao, Yiqun Wang, Longju Bai

    Abstract: This paper presents a comprehensive exploration of relation extraction utilizing advanced language models, specifically Chain of Thought (CoT) and Graphical Reasoning (GRE) techniques. We demonstrate how leveraging in-context learning with GPT-3.5 can significantly enhance the extraction process, particularly through detailed example-based reasoning. Additionally, we introduce a novel graphical re… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  19. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  20. arXiv:2404.02668  [pdf, other

    cs.CV

    RS-Mamba for Large Remote Sensing Image Dense Prediction

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional… ▽ More

    Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages,8 figures

  21. arXiv:2404.01767  [pdf, other

    cs.CL

    Class-Incremental Few-Shot Event Detection

    Authors: Kailin Zhao, Xiaolong **, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2404.01695  [pdf, other

    cs.LG

    Selective Temporal Knowledge Graph Reasoning

    Authors: Zhongni Hou, Xiaolong **, Zixuan Li, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Temporal Knowledge Graph (TKG), which characterizes temporally evolving facts in the form of (subject, relation, object, timestamp), has attracted much attention recently. TKG reasoning aims to predict future facts based on given historical ones. However, existing TKG reasoning models are unable to abstain from predictions they are uncertain, which will inevitably bring risks in real-world applica… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  23. arXiv:2403.16407  [pdf, other

    cs.CV

    A Survey on Long Video Generation: Challenges, Methods, and Prospects

    Authors: Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai

    Abstract: Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications. One critical aspect of this field is the generation of long-duration videos, which presents unique challenges and opportunities. This paper presents the first survey of recent advancements in long video generation and summarises them into two key paradigms: divide and conq… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  24. arXiv:2403.16227  [pdf, other

    cs.CV

    Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System

    Authors: **g Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Lixin Cui, Edwin R. Hancock

    Abstract: Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different t… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  25. arXiv:2403.16162  [pdf, other

    cs.AI

    Multi-Task Learning with Multi-Task Optimization

    Authors: Lu Bai, Abhishek Gupta, Yew-Soon Ong

    Abstract: Multi-task learning solves multiple correlated tasks. However, conflicts may exist between them. In such circumstances, a single solution can rarely optimize all the tasks, leading to performance trade-offs. To arrive at a set of optimized yet well-distributed models that collectively embody different trade-offs in one algorithmic pass, this paper proposes to view Pareto multi-task learning throug… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  26. arXiv:2403.16133  [pdf, other

    cs.AI cs.LG

    SSHPool: The Separated Subgraph-based Hierarchical Pooling

    Authors: Zhuo Xu, Lixin Cui, Yue Wang, Hangyuan Du, Lu Bai, Edwin R. Hancock

    Abstract: In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. To this end, we commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ a local graph convolution units as the local structure to further compress each subg… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  27. arXiv:2403.16130  [pdf, other

    cs.LG cs.AI

    AKBR: Learning Adaptive Kernel-based Representations for Graph Classification

    Authors: Feifei Qian, Lixin Cui, Yue Wang, Hangyuan Du, Lu Bai, Edwin R. Hancock

    Abstract: In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  28. arXiv:2403.11817  [pdf, other

    cs.CV

    HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

    Authors: Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

    Abstract: We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised manner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image-plane view and bird-eye view can be established,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.11035  [pdf

    physics.optics cs.CV cs.NE physics.app-ph

    Multiplane Quantitative Phase Imaging Using a Wavelength-Multiplexed Diffractive Optical Processor

    Authors: Che-Yung Shen, **gxi Li, Tianyi Gan, Yuhang Li, Langxing Bai, Mona Jarrahi, Aydogan Ozcan

    Abstract: Quantitative phase imaging (QPI) is a label-free technique that provides optical path length information for transparent specimens, finding utility in biology, materials science, and engineering. Here, we present quantitative phase imaging of a 3D stack of phase-only objects using a wavelength-multiplexed diffractive optical processor. Utilizing multiple spatially engineered diffractive layers tra… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 27 Pages, 9 Figures

  30. arXiv:2403.07969  [pdf, other

    cs.LG cs.AI

    KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

    Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong **, Jiafeng Guo, Xueqi Cheng

    Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  31. arXiv:2403.07687  [pdf, other

    cs.CV cs.AI cs.CL

    Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

    Authors: Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

    Abstract: Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted at COLING 2024

  32. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  33. arXiv:2402.13270  [pdf, other

    physics.ao-ph cs.AI cs.LG physics.data-an

    Global Tropical Cyclone Intensity Forecasting with Multi-modal Multi-scale Causal Autoregressive Model

    Authors: Xinyu Wang, Kang Chen, Lei Liu, Tao Han, Bin Li, Lei Bai

    Abstract: Accurate forecasting of Tropical cyclone (TC) intensity is crucial for formulating disaster risk reduction strategies. Current methods predominantly rely on limited spatiotemporal information from ERA5 data and neglect the causal relationships between these physical variables, failing to fully capture the spatial and temporal patterns required for intensity forecasting. To address this issue, we p… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  34. arXiv:2402.12376  [pdf, other

    cs.CV

    FiT: Flexible Vision Transformer for Diffusion Model

    Authors: Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: Nature is infinitely resolution-free. In the context of this reality, existing diffusion models, such as Diffusion Transformers, often face challenges when processing image resolutions outside of their trained domain. To overcome this limitation, we present the Flexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with unrestricted resolutions an… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  35. arXiv:2402.11476  [pdf, other

    cs.CV

    EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

    Authors: Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren

    Abstract: Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ISBI 2024

  36. arXiv:2402.06985  [pdf, other

    cs.CV cs.AI cs.RO

    OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

    Authors: Long Bai, Guankun Wang, Jie Wang, Xiaoxiao Yang, Huxin Gao, Xin Liang, An Wang, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from class… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ICRA 2024

  37. arXiv:2402.06646  [pdf

    physics.ao-ph cs.LG physics.geo-ph

    Diffusion Model-based Probabilistic Downscaling for 180-year East Asian Climate Reconstruction

    Authors: Fenghua Ling, Zeyu Lu, **g-Jia Luo, Lei Bai, Swadhin K. Behera, Dachao **, Baoxiang Pan, Huidong Jiang, Toshio Yamagata

    Abstract: As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. He… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  38. arXiv:2402.05860  [pdf, other

    cs.CV

    Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

    Authors: Mengya Xu, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and exhibit catastrophic forgetting, which refers to the sharp decline in performance on previously learned tasks after learning a new one. Specifical… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, IEEE Transactions on Medical Image (accepted)

  39. arXiv:2402.04290  [pdf, other

    cs.LG cs.AI

    CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

    Authors: Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang

    Abstract: Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management. Despite progresses have been made based on deep learning, two key challenges of precipitation nowcasting are not well-solved: (i) the modeling of complex precipitation system evolutions with different scales, and (ii) accurate forecasts for extreme pre… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  40. arXiv:2402.01295  [pdf, other

    cs.LG cs.AI

    ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

    Authors: Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models. However, most of these ML models struggle with accurately predicting extreme weather, which is closely related to the extreme value prediction. Through mathematical analysis, w… ▽ More

    Submitted 9 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  41. arXiv:2402.00059  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting

    Authors: Tao Han, Song Guo, Fenghua Ling, Kang Chen, Junchao Gong, **gjia Luo, Junxia Gu, Kan Dai, Wanli Ouyang, Lei Bai

    Abstract: Kilometer-scale modeling of global atmosphere dynamics enables fine-grained weather forecasting and decreases the risk of disastrous weather and climate activity. Therefore, building a kilometer-scale global forecast model is a persistent pursuit in the meteorology domain. Active international efforts have been made in past decades to improve the spatial resolution of numerical weather models. Non… ▽ More

    Submitted 28 January, 2024; originally announced February 2024.

    Comments: 19 pages

  42. arXiv:2401.16669  [pdf

    cs.LG cs.AI physics.ao-ph physics.geo-ph

    Improving Global Weather and Ocean Wave Forecast with Large Artificial Intelligence Models

    Authors: Fenghua Ling, Lin Ouyang, Boufeniza Redouane Larbi, **g-Jia Luo, Tao Han, Xiaohui Zhong, Lei Bai

    Abstract: The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  43. arXiv:2401.16416  [pdf, other

    cs.CV

    Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

    Authors: Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previo… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  44. arXiv:2401.12681  [pdf, other

    cs.LG cs.AI

    Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning

    Authors: Zhishuai Li, Yunhao Nie, Ziyue Li, Lei Bai, Yisheng Lv, Rui Zhao

    Abstract: Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted in AISTATS 2024

  45. arXiv:2401.11960  [pdf, other

    cs.CV eess.IV

    Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  46. arXiv:2401.09274  [pdf, ps, other

    math.OC cs.LG

    Avoiding strict saddle points of nonconvex regularized problems

    Authors: Luwei Bai

    Abstract: We introduce a strict saddle property for $\ell_p$ regularized functions, and propose an iterative reweighted $\ell_1$ algorithm to solve the $\ell_p$ regularized problems. The algorithm is guaranteed to converge only to local minimizers when randomly initialized. The strict saddle property is shown generic on these sparse optimization problems. Those analyses as well as the proposed algorithm can… ▽ More

    Submitted 9 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 22 pages

  47. arXiv:2401.06013  [pdf, other

    cs.CV cs.AI

    Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoR… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by IPCAI 2024 (IJCAR Special Issue)

  48. arXiv:2401.04148  [pdf, other

    cs.LG cs.AI eess.SP

    Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

    Authors: Pengxin Guo, Pengrong **, Ziyue Li, Lei Bai, Yu Zhang

    Abstract: Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the traine… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  49. arXiv:2401.01759  [pdf, other

    cs.SI cs.CL cs.CV cs.MM

    VGA: Vision and Graph Fused Attention Network for Rumor Detection

    Authors: Lin Bai, Caiyan Jia, Ziying Song, Chaoqun Cui

    Abstract: With the development of social media, rumors have been spread broadly on social media platforms, causing great harm to society. Beside textual information, many rumors also use manipulated images or conceal textual information within images to deceive people and avoid being detected, making multimodal rumor detection be a critical problem. The majority of multimodal rumor detection methods mainly… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  50. arXiv:2401.01117  [pdf, other

    cs.CV eess.IV

    Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

    Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures