Skip to main content

Showing 1–50 of 285 results for author: Bai, S

.
  1. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  2. arXiv:2406.14535  [pdf, other

    stat.ME math.ST

    On estimation and order selection for multivariate extremes via clustering

    Authors: Shiyuan Deng, He Tang, Shuyang Bai

    Abstract: We investigate the estimation of multivariate extreme models with a discrete spectral measure using spherical clustering techniques. The primary contribution involves devising a method for selecting the order, that is, the number of clusters. The method consistently identifies the true order, i.e., the number of spectral atoms, and enjoys intuitive implementation in practice. Specifically, we intr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 31 pages, 12 figures

    MSC Class: 62G32 (Primary); 60G70 (Secondary)

  3. arXiv:2406.06162  [pdf, other

    quant-ph cond-mat.quant-gas

    Long-Range Quantum Tunneling via Matter Wave

    Authors: Yuan-Xing Yang, Si-Yuan Bai, Jun-Hong An

    Abstract: Quantum tunneling refers to a phenomenon that a microscopic object can pass through a potential barrier even it does not have enough energy to overcome the barrier. It has led to many modern applications and nanotechnologies. A general belief is that quantum tunneling, as a manifestation of the wave-particle duality, occurs only when the width of the barrier is comparable to or smaller than the de… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 7 pages and 4 figures in the main text. 5 pages and 3 figures in the supplemental material

  4. arXiv:2406.04322  [pdf, other

    cs.CV

    DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

    Authors: Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

    Abstract: We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024. Code: https://github.com/qihao067/direct3d Project page: https://direct-3d.github.io/

  5. arXiv:2406.00931  [pdf, ps, other

    math.SG math.AG

    Cohomological splitting over rationally connected bases

    Authors: Shaoyun Bai, Daniel Pomerleano, Guangbo Xu

    Abstract: We prove a cohomological splitting result for Hamiltonian fibrations over enumeratively rationally connected symplectic manifolds. As a key application, we prove that the cohomology of a smooth, projective family over a smooth stably rational projective variety splits additively over any field. The main ingredient in our arguments is the theory of Fukaya-Parker-Ono (FOP) perturbations developed by… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 36 pages, comments welcome!

  6. arXiv:2406.00532  [pdf, other

    cs.AI cs.LG

    Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques

    Authors: Samita Bai, Sidra Nasir, Rizwan Ahmed Khan, Sheeraz Arif, Alexandre Meyer, Hubert Konik

    Abstract: Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies cont… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2405.08779  [pdf, other

    cs.LG

    Jacobian Regularizer-based Neural Granger Causality

    Authors: Wanqi Zhou, Shuanghao Bai, Shujian Yu, Qibin Zhao, Badong Chen

    Abstract: With the advancement of neural networks, diverse methods for neural Granger causality have emerged, which demonstrate proficiency in handling complex data, and nonlinear relationships. However, the existing framework of neural Granger causality has several limitations. It requires the construction of separate predictive models for each target variable, and the relationship depends on the sparsity… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 20 pages, 7 figures, ICML 2024

  8. arXiv:2405.08484  [pdf, other

    quant-ph cs.LG nlin.CD stat.ML

    Universal replication of chaotic characteristics by classical and quantum machine learning

    Authors: Sheng-Chen Bai, Shi-Ju Ran

    Abstract: Replicating chaotic characteristics of non-linear dynamics by machine learning (ML) has recently drawn wide attentions. In this work, we propose that a ML model, trained to predict the state one-step-ahead from several latest historic states, can accurately replicate the bifurcation diagram and the Lyapunov exponents of discrete dynamic systems. The characteristics for different values of the hype… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  9. arXiv:2405.05821  [pdf, ps, other

    math.SG math.AG math.AT

    Equivariant formality in complex-oriented theories

    Authors: Shaoyun Bai, Daniel Pomerleano

    Abstract: Let $G$ be a product of unitary groups and let $(M,ω)$ be a compact symplectic manifold with Hamiltonian $G$-action. We prove an equivariant formality result for any complex-oriented cohomology theory $\mathbb{E}^*$ (in particular, integral cohomology). This generalizes the celebrated result of Atiyah-Bott-Kirwan for rational cohomology from the 1980s. The proof does not use classical ideas but in… ▽ More

    Submitted 22 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 14 pages, comments are welcome! v2: minor updates

  10. arXiv:2404.19287  [pdf, other

    cs.CV

    Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

    Authors: Wanqi Zhou, Shuanghao Bai, Qibin Zhao, Badong Chen

    Abstract: Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been over… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 16 pages, 14 figures

  11. arXiv:2404.19286  [pdf, other

    cs.CV

    Soft Prompt Generation for Domain Generalization

    Authors: Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen

    Abstract: Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning m… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 23 pages, 4 figures

  12. arXiv:2404.16596  [pdf

    physics.med-ph physics.ins-det

    Stimulated Emission Depletion (STED) Magnetic Particle Imaging

    Authors: Guang Jia, Zhongwei Bian, Tianshu Li, Shi Bai, Chenxing Hu, Lixuan Zhao, Peng Gao, Tan** Li, Hui Hui, Jie Tian

    Abstract: Magnetic particle imaging (MPI) is an in-vivo imaging method to detect magnetic nanoparticles for blood vessel imaging and molecular target imaging. Compared with conventional molecular imaging devices (such as nuclear medicine imaging PET and SPECT), magnetic nanoparticles have longer storage periods than radionuclides without ionizing radiation. MPI has higher detection sensitivity compared with… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 29 pages, 24 figures, 2 tables

  13. arXiv:2404.14724  [pdf

    cs.RO

    Tightly Joined Positioning and Control Model for Unmanned Aerial Vehicles Based on Factor Graph Optimization

    Authors: Peiwen Yang, Weisong Wen, Shiyu Bai, Li-Ta Hsu

    Abstract: The execution of flight missions by unmanned aerial vehicles (UAV) primarily relies on navigation. In particular, the navigation pipeline has traditionally been divided into positioning and control, operating in a sequential loop. However, the existing navigation pipeline, where the positioning and control are decoupled, struggles to adapt to ubiquitous uncertainties arising from measurement noise… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  14. arXiv:2404.14471  [pdf, other

    cs.CV

    Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

    Authors: Shiyi Zhang, Sule Bai, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang, Yansong Tang

    Abstract: In this paper, we investigate a new problem called narrative action evaluation (NAE). NAE aims to generate professional commentary that evaluates the execution of an action. Unlike traditional tasks such as score-based action quality assessment and video captioning involving superficial sentences, NAE focuses on creating detailed narratives in natural language. These narratives provide intricate d… ▽ More

    Submitted 26 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  15. arXiv:2404.10499  [pdf, other

    cs.CV cs.AI

    Robust Noisy Label Learning via Two-Stream Sample Distillation

    Authors: Sihan Bai, San** Zhou, Zheng Qin, Le Wang, Nanning Zheng

    Abstract: Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the model training process. In this paper, we design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD), for noisy labe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  16. Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

    Authors: Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

    Abstract: This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 5 pages, 4 figures, published by TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2023

  17. arXiv:2404.06046  [pdf, other

    nucl-ex nucl-th

    Nuclear charge radii of germanium isotopes around $N$ = 40

    Authors: S. J. Wang, A. Kanellakopoulos, X. F. Yang, S. W. Bai, J. Billowes, M. L. Bissell, K. Blaum, B. Cheal, C. S. Devlin, R. F. Garcia Ruiz, J. Z. Han, H. Heylen, S. Kaufmann, K. Konig, A. Koszorus, S. Lechner, S. Malbrunot-Ettenauer, W. Nazarewicz, R. Neugart, G. Neyens, W. Nortershauser, T. Ratajczyk, P. -G. Reinhard, L. V. Rodrıguez, S. Sels , et al. (4 additional authors not shown)

    Abstract: Collinear laser spectroscopy measurements were performed on $^{68-74}$Ge isotopes ($Z = 32$) at ISOLDE-CERN, by probing the $4s^2 4p^2 \, ^3\!P_1 \rightarrow 4s^2 4p 5s \, ^3\!P_1^o$ atomic transition (269~nm) of germanium. Nuclear charge radii are determined via the measured isotope shifts, revealing a larger local variation than the neighboring isotopic chains. Nuclear density functional theory… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 6 pages,5 figures

  18. arXiv:2404.03067  [pdf, other

    cs.RO cs.CV

    Self-supervised 6-DoF Robot Gras** by Demonstration via Augmented Reality Teleoperation System

    Authors: Xiwen Dengxiong, Xueting Wang, Shi Bai, Yunbo Zhang

    Abstract: Most existing 6-DoF robot gras** solutions depend on strong supervision on grasp pose to ensure satisfactory performance, which could be laborious and impractical when the robot works in some restricted area. To this end, we propose a self-supervised 6-DoF grasp pose detection framework via an Augmented Reality (AR) teleoperation system that can efficiently learn human demonstrations and provide… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  19. arXiv:2404.02025  [pdf, other

    nucl-ex physics.atom-ph

    High-precision measurement of the atomic mass of $^{84}$Sr and implications to isotope shift studies

    Authors: Zhuang Ge, Shiwei Bai, Tommi Eronen, Ari Jokinen, Anu Kankainen, Sonja Kujanpää, Iain Moore, Dmitrii Nesterenko, Mikael Reponen

    Abstract: The absolute mass of $^{84}$Sr was determined using the phase-imaging ion-cyclotron-resonance technique with the JYFLTRAP double Penning trap mass spectrometer. A more precise value for the mass of $^{84}$Sr is essential for providing potential indications of physics beyond the Standard Model through high-precision isotope shift measurements of Sr atomic transition frequencies. The mass excess of… ▽ More

    Submitted 22 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures

  20. arXiv:2404.01853  [pdf, other

    cs.LG cs.CV

    Pairwise Similarity Distribution Clustering for Noisy Label Learning

    Authors: Sihan Bai

    Abstract: Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels, whose main challenge comes from how to deal with the inaccurate supervision caused by wrong labels. Existing works either take the label correction or sample selection paradigm to involve more samples with accurate labels into the training process. In this paper, we propose a simple yet effec… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2403.09336  [pdf, other

    physics.atom-ph nucl-ex

    Radiative lifetime of the A 2Π1/2 state in RaF with relevance to laser cooling

    Authors: M. Athanasakis-Kaklamanakis, S. G. Wilkins, P. Lassègues, L. Lalanne, J. R. Reilly, O. Ahmad, M. Au, S. W. Bai, J. Berbalk, C. Bernerd, A. Borschevsky, A. A. Breier, K. Chrysalidis, T. E. Cocolios, R. P. de Groote, C. M. Fajardo-Zambrano, K. T. Flanagan, S. Franchoo, R. F. Garcia Ruiz, D. Hanstorp, R. Heinke, P. Imgram, A. Koszorús, A. A. Kyuberis, J. Lim , et al. (16 additional authors not shown)

    Abstract: The radiative lifetime of the $A$ $^2 Π_{1/2}$ (v=0) state in radium monofluoride (RaF) is measured to be 35(1) ns. The lifetime of this state and the related decay rate $Γ= 2.86(8) \times 10^7$ $s^{-1}$ are of relevance to the laser cooling of RaF via the optically closed $A$ $^2 Π_{1/2} \leftarrow X$ $^2Σ_{1/2}$ transition, which makes the molecule a promising probe to search for new physics. Ra… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted as a Letter in Physical Review A; 8 pages of main text, 5 pages of supplemental material

  22. arXiv:2403.08506  [pdf, other

    cs.LG cs.AI cs.CV

    DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning

    Authors: Sikai Bai, Jie Zhang, Shuaicheng Li, Song Guo, **gcai Guo, Jun Hou, Tao Han, Xiaocheng Lu

    Abstract: Federated learning (FL) has emerged as a powerful paradigm for learning from decentralized data, and federated domain generalization further considers the test dataset (target domain) is absent from the decentralized training data (source domains). However, most existing FL methods assume that domain labels are provided during training, and their evaluation imposes explicit constraints on the numb… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  23. arXiv:2403.08192  [pdf, other

    cs.CL q-bio.BM

    MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

    Authors: Xingyu Lu, He Cao, Zi**g Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, Yu Li

    Abstract: Large language models are playing an increasingly significant role in molecular research, yet existing models often generate erroneous information, posing challenges to accurate molecular comprehension. Traditional evaluation metrics for generated content fail to assess a model's accuracy in molecular understanding. To rectify the absence of factual evaluation, we present MoleculeQA, a novel quest… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 19 pages, 8 figures

  24. arXiv:2403.06764  [pdf, other

    cs.CV cs.AI cs.CL

    An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

    Authors: Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang

    Abstract: In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we i… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 21 papes, 8 figures, code is released at https://github.com/pkunlp-icler/FastV

  25. arXiv:2402.14577  [pdf, other

    cs.CV

    Debiasing Text-to-Image Diffusion Models

    Authors: Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

    Abstract: Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem se… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  26. arXiv:2401.15865  [pdf, other

    cs.CV

    LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection

    Authors: Sifan Zhou, Liang Li, Xinyu Zhang, Bo Zhang, Shipeng Bai, Miao Sun, Ziyu Zhao, Xiaobo Lu, Xiangxiang Chu

    Abstract: Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression approach, Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. However, applying it directly to 3D lidar-based tasks inevitably leads to perform… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted in ICLR 2024

  27. arXiv:2401.11002  [pdf, other

    cs.CV cs.AI

    Fast Registration of Photorealistic Avatars for VR Facial Animation

    Authors: Chaitanya Patel, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei

    Abstract: Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime mode… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Project page: https://chaitanya100100.github.io/FastRegistration/

  28. Existence and multiplicity of solutions for critical Kirchhoff-Choquard equations involving the fractional $p$-Laplacian on the Heisenberg group

    Authors: S. Bai, Y. Song, D. D. Repovš

    Abstract: In this paper, we study existence and multiplicity of solutions for the following Kirchhoff-Choquard type equation involving the fractional $p$-Laplacian on the Heisenberg group: \begin{equation*} \begin{array}{lll} M(\|u\|_μ^{p})(μ(-Δ)^{s}_{p}u+V(ξ)|u|^{p-2}u)= f(ξ,u)+\int_{\mathbb{H}^N}\frac{|u(η)|^{Q_λ^{\ast}}}{|η^{-1}ξ|^λ}dη|u|^{Q_λ^{\ast}-2}u &\mbox{in}\ \mathbb{H}^N, \\ \end{array} \end{equa… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    MSC Class: 35B25; 35J15; 35J20; 35J60; 35R03: 46E35

    Journal ref: J. Nonlin. Variat. Anal. 8:1 (2024), 143-166

  29. arXiv:2401.02620  [pdf, other

    cs.AI cs.GR

    Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

    Authors: Song Bai, Jie Li

    Abstract: While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors cont… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  30. arXiv:2401.01885  [pdf, other

    cs.CV

    From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

    Authors: Evonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard

    Abstract: We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. The key behind our method is in combining the benefits of sample diversity from vector quantization with the high-frequency… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  31. arXiv:2401.00616  [pdf, other

    cs.CV

    GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

    Authors: Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

    Abstract: In this paper, we focus on the One-shot Novel View Synthesis (O-NVS) task which targets synthesizing photo-realistic novel views given only one reference image per scene. Previous One-shot Generalizable Neural Radiance Fields (OG-NeRF) methods solve this task in an inference-time finetuning-free manner, yet suffer the blurry issue due to the encoder-only architecture that highly relies on the limi… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Submitted to Journal

  32. arXiv:2312.09589  [pdf, other

    cs.CV

    Improving Cross-domain Few-shot Classification with Multilayer Perceptron

    Authors: Shuanghao Bai, Wanqi Zhou, Zhirong Luan, Donglin Wang, Badong Chen

    Abstract: Cross-domain few-shot classification (CDFSC) is a challenging and tough task due to the significant distribution discrepancies across different domains. To address this challenge, many approaches aim to learn transferable representations. Multilayer perceptron (MLP) has shown its capability to learn transferable representations in various downstream tasks, such as unsupervised image classification… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 5pages, 4 figures

  33. arXiv:2312.09553  [pdf, other

    cs.CV

    Prompt-based Distribution Alignment for Unsupervised Domain Adaptation

    Authors: Shuanghao Bai, Min Zhang, Wanqi Zhou, Siteng Huang, Zhirong Luan, Donglin Wang, Badong Chen

    Abstract: Recently, despite the unprecedented success of large pre-trained visual-language models (VLMs) on a wide range of downstream tasks, the real-world unsupervised domain adaptation (UDA) problem is still not well explored. Therefore, in this paper, we first experimentally demonstrate that the unsupervised-trained VLMs can significantly reduce the distribution discrepancy between source and target dom… ▽ More

    Submitted 26 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 13pages,6figures

  34. arXiv:2312.09158  [pdf, other

    cs.CV

    General Object Foundation Model for Images and Videos at Scale

    Authors: Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai

    Abstract: We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos. Through a unified framework, GLEE accomplishes detection, segmentation, tracking, grounding, and identification of arbitrary objects in the open world scenario for various object perception tasks. Adopting a cohesive learning strategy, GLEE acquires knowledge from diverse data… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project homepage: https://glee-vision.github.io

  35. arXiv:2312.04089  [pdf, other

    cs.CV

    Open-Vocabulary Segmentation with Semantic-Assisted Calibration

    Authors: Yong Liu, Sule Bai, Guanbin Li, Yitong Wang, Yansong Tang

    Abstract: This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with generalized contextual prior of CLIP. As the core of open-vocabulary understanding, alignment of visual content with the semantics of unbounded text has become the bottleneck of this field. To address this challenge, recent works propose to utilize CLIP as an additional cl… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  36. arXiv:2312.02481  [pdf, other

    cs.CV cs.AI

    Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery

    Authors: Yansheng Li, Junwei Luo, Yongjun Zhang, Yihua Tan, **-Gang Yu, Song Bai

    Abstract: Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 16 pages, 11 figures, 6 tables; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file

  37. arXiv:2310.15462  [pdf, ps, other

    math.PR math.ST

    Empirical limit theorems for Wiener chaos

    Authors: Shuyang Bai, Jiemiao Chen

    Abstract: We consider empirical measures in a triangular array setup with underlying distributions varying as sample size grows. We study asymptotic properties of multiple integrals with respect to normalized empirical measures. Limit theorems involving series of multiple Wiener-Itô integrals are established.

    Submitted 19 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    MSC Class: 60F05; 60H05

  38. arXiv:2310.14645  [pdf, other

    quant-ph

    Temperature-heat uncertainty relation for quantum thermometry

    Authors: Ning Zhang, Si-Yuan Bai, Chong Chen

    Abstract: We investigate the resource theory for temperature estimation. We demonstrate that it is the fluctuation of heat that fundamentally determines temperature precision through the temperature-heat uncertainty relation. Specifically, we find that heat is divided into trajectory heat and correlation heat, which are associated with the heat exchange along thermometer's evolution path and the correlation… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 6 pages, 1 figure

  39. High and low perturbations of the critical Choquard equation on the Heisenberg group

    Authors: Shujie Bai, Yueqiang Song, Dušan D. Repovš

    Abstract: We study the following critical Choquard equation on the Heisenberg group: \begin{equation*} \begin{cases} \displaystyle {-Δ_H u }=μ |u|^{q-2}u+\int_Ω \frac{|u(η)|^{Q_λ^{\ast}}} {|η^{-1}ξ|^λ} dη|u|^{Q_λ^{\ast}-2}u &\mbox{in }\ Ω, u=0 &\mbox{on }\ \partialΩ, \end{cases} \end{equation*} where $Ω\subset \mathbb{H}^N$ is a smooth bounded domain, $Δ_H$ is the Kohn-Laplacian on the Heisenberg grou… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    MSC Class: 35J20; 35R03; 46E35

    Journal ref: Adv. Differential Equations 29:3-4 (2024), 153-178

  40. arXiv:2310.06218  [pdf, other

    cs.LG cs.AI

    SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

    Authors: **gyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, Yong Liu

    Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 14 pages, 4 figures, Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  41. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: **ze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  42. arXiv:2309.16006  [pdf

    physics.app-ph

    Quantitative Analysis of Sodium Metal Deposition and Interphase in Na Metal Batteries

    Authors: Baharak Sayahpour, Weikang Li, Shuang Bai, Bingyu Lu, Bing Han, Yu-Ting Chen, Grayson Deysher, Saurabh Parab, Phillip Ridley, Ganesh Raghavendran, Long Hoang Bao Nguyen, Minghao Zhang, Ying Shirley Meng

    Abstract: Sodium-ion batteries exhibit significant promise as a viable alternative to current lithium-ion technologies owing to their sustainability, low cost per energy density, reliability, and safety. Despite recent advancements in cathode materials for this category of energy storage systems, the primary challenge in realizing practical applications of sodium-ion systems is the absence of an anode syste… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  43. arXiv:2309.07991  [pdf, other

    math.SG math.DS

    Franks' dichotomy for toric manifolds, Hofer-Zehnder conjecture, and gauged linear sigma model

    Authors: Shaoyun Bai, Guangbo Xu

    Abstract: We prove that for any compact toric symplectic manifold, if a Hamiltonian diffeomorphism admits more fixed points, counted homologically, than the total Betti number, then it has infinitely many simple periodic points. This provides a vast generalization of Franks' famous two or infinity dichotomy for periodic orbits of area-preserving diffeomorphisms on the two-sphere, and establishes a conjectur… ▽ More

    Submitted 10 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: v2: 94 pages, 6 figures. New title, with expository changes in the introduction and main part. Comments are welcome!

  44. arXiv:2309.07698  [pdf, other

    cs.CV

    Dataset Condensation via Generative Model

    Authors: David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

    Abstract: Dataset condensation aims to condense a large dataset with a lot of training samples into a small set. Previous methods usually condense the dataset into the pixels format. However, it suffers from slow optimization speed and large number of parameters to be optimized. When increasing image resolutions and classes, the number of learnable parameters grows accordingly, prohibiting condensation meth… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: old work,done in 2022

  45. Ethical Framework for Harnessing the Power of AI in Healthcare and Beyond

    Authors: Sidra Nasir, Rizwan Ahmed Khan, Samita Bai

    Abstract: In the past decade, the deployment of deep learning (Artificial Intelligence (AI)) methods has become pervasive across a spectrum of real-world applications, often in safety-critical contexts. This comprehensive research article rigorously investigates the ethical dimensions intricately linked to the rapid evolution of AI technologies, with a particular focus on the healthcare domain. Delving deep… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Journal ref: IEEE Access 2024

  46. arXiv:2308.16890  [pdf, other

    cs.CV cs.CL

    TouchStone: Evaluating Vision-Language Models by Language Models

    Authors: Shuai Bai, Shusheng Yang, **ze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, **gren Zhou

    Abstract: Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs). However, current assessments mainly focus on recognizing and reasoning abilities, lacking direct evaluation of conversational skills and neglecting visual s… ▽ More

    Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: https://github.com/OFA-Sys/TouchStone

  47. arXiv:2308.12966  [pdf, other

    cs.CV cs.CL

    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

    Authors: **ze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, **gren Zhou

    Abstract: In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. Starting from the Qwen-LM as a foundation, we endow it with visual capacity by the meticulously designed (i) visual receptor, (ii) input-output interface, (iii) 3-stage training pipeline, and (iv) multilingual multimodal cleaned corpus. Beyon… ▽ More

    Submitted 12 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Code, demo and models are available at https://github.com/QwenLM/Qwen-VL

  48. arXiv:2308.09988  [pdf, ps, other

    math.AP math.FA

    On $p$-Laplacian Kirchhoff-Schrödinger-Poisson type systems with critical growth on the Heisenberg group

    Authors: Shujie Bai, Yueqiang Song, Dušan D. Repovš

    Abstract: In this article, we investigate the Kirchhoff-Schrödinger-Poisson type systems on the Heisenberg group of the following form: \begin{equation*} \left\{ \begin{array}{lll} {-(a+b\int_Ω|\nabla_{H} u|^{p}dξ)Δ_{H,p}u-μφ|u|^{p-2}u}=λ|u|^{q-2}u+|u|^{Q^{\ast}-2}u &\mbox{in}\ Ω, \\ -Δ_{H}φ=|u|^{p} &\mbox{in}\ Ω, \\ u=φ=0 &\mbox{on}\ \partialΩ, \end{array} \right. \end{equation*} where $a,b$ are positive r… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    MSC Class: 35J20; 35R03; 46E35

    Journal ref: Electron. Res. Arch. 31:9 (2023), 5749-5765

  49. arXiv:2308.07209  [pdf, other

    cs.LG cs.CV eess.IV

    Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning

    Authors: Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, Yong Liu

    Abstract: Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore,… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  50. arXiv:2308.06739  [pdf, other

    cs.CV

    Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

    Authors: David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou

    Abstract: Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.