Search | arXiv e-print repository

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Authors: Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia

Abstract: Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the autom… ▽ More Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure to prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotation for human annotators, and for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all the open-sourced counterparts on CODA-LM. Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task. We hope CODA-LM can become the catalyst to promote interpretable self-driving empowered by LVLMs. △ Less

Submitted 26 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Project Page: https://coda-dataset.github.io/coda-lm/

arXiv:2404.10253 [pdf, other]

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiao**g Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 18 pages, 13 figures

arXiv:2404.09720 [pdf, ps, other]

A stability result for almost perfect matchings

Authors: Mingyang Guo, Hongliang Lu

Abstract: Let $n,k,s$ be three integers and $β$ be a sufficiently small positive number such that $k\geq 3$, $0<1/n\ll β\ll 1/k$ and $ks+k\leq n\leq (1+β)ks+k-2$. A $k$-graph is called non-trivial if it has no isolated vertex. In this paper, we determine the maximum number of edges in a non-trivial $k$-graph with $n$ vertices and matching number at most $s$. This result confirms a conjecture proposed by Fra… ▽ More Let $n,k,s$ be three integers and $β$ be a sufficiently small positive number such that $k\geq 3$, $0<1/n\ll β\ll 1/k$ and $ks+k\leq n\leq (1+β)ks+k-2$. A $k$-graph is called non-trivial if it has no isolated vertex. In this paper, we determine the maximum number of edges in a non-trivial $k$-graph with $n$ vertices and matching number at most $s$. This result confirms a conjecture proposed by Frankl (On non-trivial families without a perfect matching, \emph{European J. Combin.}, \textbf{84} (2020), 103044) for the case when $s$ is sufficiently large. △ Less

Submitted 15 April, 2024; originally announced April 2024.

MSC Class: 05C70

arXiv:2404.09219 [pdf, ps, other]

Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the… ▽ More We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the $D^{0(+)} \to a_{0}(980)^{-(0)} π^{+}$ contribution. The ratios $\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{+}π^{-})/\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{-}π^{+})$ and $\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{+}π^{0})/\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{0}π^{+})$ are measured to be $7.5^{+2.5}_{-0.8\,\mathrm{stat.}}\pm1.7_{\mathrm{syst.}}$ and $2.6\pm0.6_{\mathrm{stat.}}\pm0.3_{\mathrm{syst.}}$, respectively. The measured $D^{0}$ ratio disagrees with the theoretical predictions by orders of magnitudes, thus implying a substantial contribution from final-state interactions. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.07863 [pdf, other]

Backdoor Contrastive Learning via Bi-level Trigger Optimization

Authors: Weiyu Sun, Xinyu Zhang, Hao Lu, Yingcong Chen, Ting Wang, **ghui Chen, Lu Lin

Abstract: Contrastive Learning (CL) has attracted enormous attention due to its remarkable capability in unsupervised representation learning. However, recent works have revealed the vulnerability of CL to backdoor attacks: the feature extractor could be misled to embed backdoored data close to an attack target class, thus fooling the downstream predictor to misclassify it as the target. Existing attacks us… ▽ More Contrastive Learning (CL) has attracted enormous attention due to its remarkable capability in unsupervised representation learning. However, recent works have revealed the vulnerability of CL to backdoor attacks: the feature extractor could be misled to embed backdoored data close to an attack target class, thus fooling the downstream predictor to misclassify it as the target. Existing attacks usually adopt a fixed trigger pattern and poison the training set with trigger-injected data, ho** for the feature extractor to learn the association between trigger and target class. However, we find that such fixed trigger design fails to effectively associate trigger-injected data with target class in the embedding space due to special CL mechanisms, leading to a limited attack success rate (ASR). This phenomenon motivates us to find a better backdoor trigger design tailored for CL framework. In this paper, we propose a bi-level optimization approach to achieve this goal, where the inner optimization simulates the CL dynamics of a surrogate victim, and the outer optimization enforces the backdoor trigger to stay close to the target throughout the surrogate CL procedure. Extensive experiments show that our attack can achieve a higher attack success rate (e.g., $99\%$ ASR on ImageNet-100) with a very low poisoning rate ($1\%$). Besides, our attack can effectively evade existing state-of-the-art defenses. Code is available at: https://github.com/SWY666/SSL-backdoor-BLTO. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted by ICLR 2024

arXiv:2404.07855 [pdf, other]

Resolve Domain Conflicts for Generalizable Remote Physiological Measurement

Authors: Weiyu Sun, Xinyu Zhang, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen

Abstract: Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issu… ▽ More Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issues across different datasets, such as (1) label conflict resulting from different phase delays between physiological signal labels and face videos at the instance level, and (2) attribute conflict stemming from distribution shifts caused by head movements, illumination changes, skin types, etc. To address this, we introduce the DOmain-HArmonious framework (DOHA). Specifically, we first propose a harmonious phase strategy to eliminate uncertain phase delays and preserve the temporal variation of physiological signals. Next, we design a harmonious hyperplane optimization that reduces irrelevant attribute shifts and encourages the model's optimization towards a global solution that fits more valid scenarios. Our experiments demonstrate that DOHA significantly improves the performance of existing methods under multiple protocols. Our code is available at https://github.com/SWY666/rPPG-DOHA. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted by ACM MM 2023

arXiv:2404.07790 [pdf, other]

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Authors: Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

Abstract: Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to… ▽ More Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07445 [pdf, other]

Multi-view Aggregation Network for Dichotomous Image Segmentation

Authors: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious mu… ▽ More Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Human visual system captures regions of interest by observing them from multiple views. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet), which unifies the feature fusion of the distant view and close-up view into a single stream with one encoder-decoder structure. With the help of the proposed multi-view complementary localization and refinement modules, our approach established long-range, profound visual interactions across multiple views, allowing the features of the detailed close-up view to focus on highly slender structures.Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed. The source code and datasets will be publicly available at \href{https://github.com/qianyu-dlut/MVANet}{MVANet}. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 as Highlight

arXiv:2404.07436 [pdf, other]

Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be… ▽ More The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.07429 [pdf, other]

Neutral-current background induced by atmospheric neutrinos at large liquid-scintillator detectors: III. Quantitative calculations for reactor neutrinos

Authors: Jie Cheng, Min Li, Yu-Feng Li, Gao-Song Li, Hao-Qi Lu, Liang-Jian Wen

Abstract: Atmospheric neutrinos contribute significantly to irreducible backgrounds through their neutral-current (NC) interactions with $^{12}$C nuclei in liquid-scintillator detectors, impacting diffuse supernova neutrino background, nucleon decay, and reactor neutrinos. This paper extends our prior work by systematically studying the NC backgrounds towards the MeV region of reactor neutrinos. We employ c… ▽ More Atmospheric neutrinos contribute significantly to irreducible backgrounds through their neutral-current (NC) interactions with $^{12}$C nuclei in liquid-scintillator detectors, impacting diffuse supernova neutrino background, nucleon decay, and reactor neutrinos. This paper extends our prior work by systematically studying the NC backgrounds towards the MeV region of reactor neutrinos. We employ contemporary neutrino generator models from GENIE and NuWro for calculations, with a focus on predicting NC background in experimental searches for inverse-beta-decay signals below 100 MeV visible energy. We estimate the systematic uncertainty to our estimation of the NC background using various data-driven neutrino generator models, addressing factors such as the initial neutrino-nucleon NC interaction, the nuclear model, the final-state interaction model, the nucleus deexcitation, and the secondary interaction on final-state particles. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 26 pages, 10 figures

arXiv:2404.07317 [pdf, other]

Building a controlled-NOT gate between polarization and frequency

Authors: Hsuan-Hao Lu, Joseph M. Lukens, Muneer Alshowkan, Brian T. Kirby, Nicholas A. Peters

Abstract: By harnessing multiple degrees of freedom (DoFs) within a single photon, controlled quantum unitaries, such as the two-qubit controlled-NOT (CNOT) gate, play a pivotal role in advancing quantum communication protocols like dense coding and entanglement distillation. In this work, we devise and realize a CNOT operation between polarization and frequency DoFs by exploiting directionally dependent el… ▽ More By harnessing multiple degrees of freedom (DoFs) within a single photon, controlled quantum unitaries, such as the two-qubit controlled-NOT (CNOT) gate, play a pivotal role in advancing quantum communication protocols like dense coding and entanglement distillation. In this work, we devise and realize a CNOT operation between polarization and frequency DoFs by exploiting directionally dependent electro-optic phase modulation within a fiber Sagnac loop. Alongside computational basis measurements, we validate the effectiveness of this operation through the synthesis of all four Bell states in a single photon, all with fidelities greater than 98%. This demonstration opens new avenues for manipulating hyperentanglement across these two crucial DoFs, marking a foundational step toward leveraging polarization-frequency resources in fiber networks for future quantum applications. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2404.06745 [pdf, other]

From a fractional quantum anomalous Hall state to a smectic state with equal Hall conductance

Authors: Hongyu Lu, Han-Qing Wu, Bin-Bin Chen, Zi Yang Meng

Abstract: The recent developments in twisted MoTe$_2$ and rhombohedral multilayer graphene have generated widespread attention to the general features of fractional quantum anomalous Hall (FQAH) states, including their possible coexistence with and transition to various symmetry breaking charge ordered states. These attentions are pushing forward our knowledge of the relation between the topological order i… ▽ More The recent developments in twisted MoTe$_2$ and rhombohedral multilayer graphene have generated widespread attention to the general features of fractional quantum anomalous Hall (FQAH) states, including their possible coexistence with and transition to various symmetry breaking charge ordered states. These attentions are pushing forward our knowledge of the relation between the topological order in FQAH states and the Landau-type of symmetry breaking order such as the 1D smectic electronic liquid crystal and 2D charge-density-wave (CDW) solid. Although the transitions from topological states to symmetry breaking states with trivial topology have been discussed, the road from one topological ordered state to another with the same Hall conductance and broken translational symmetry has not been found. Here we show the intriguing evidence that the FQAH to FQAH Smectic (FQAHS) transition is robustly realizable in the archetypal correlated flat Chern-band model at filling $ν$ = 2/3. This transition is novel in that: i) the FQAHS acquires the same fractional Hall conductance as FQAH, which cannot be explained by mean-field band folding. The formation of smectic order can be viewed as perturbation around the transition point, and thus, do not destroy or change the original topology; ii) the charge excitation remains gapped across the transition although the neutral gap is closed at transition point; and iii) the transition is triggered by the softening of roton mode with the same wave vector as the smectic order. Our discovery opens countless new possibilities, both theoretical and experimental, in the fast-growing field of robust fractional Chern insulators. △ Less

Submitted 17 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06718 [pdf, other]

Measurement of the Born cross section for $e^{+}e^{-}\to ηh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,… ▽ More We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06700 [pdf, other]

Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting

Authors: Hao Lu, Jiaqi Tang, Xinli Xu, Xu Cao, Yunpeng Zhang, Guoqing Wang, Dalong Du, Hao Chen, Yingcong Chen

Abstract: The emergence of Multi-Camera 3D Object Detection (MC3D-Det), facilitated by bird's-eye view (BEV) representation, signifies a notable progression in 3D object detection. Scaling MC3D-Det training effectively accommodates varied camera parameters and urban landscapes, paving the way for the MC3D-Det foundation model. However, the multi-view fusion stage of the MC3D-Det method relies on the ill-pos… ▽ More The emergence of Multi-Camera 3D Object Detection (MC3D-Det), facilitated by bird's-eye view (BEV) representation, signifies a notable progression in 3D object detection. Scaling MC3D-Det training effectively accommodates varied camera parameters and urban landscapes, paving the way for the MC3D-Det foundation model. However, the multi-view fusion stage of the MC3D-Det method relies on the ill-posed monocular perception during training rather than surround refinement ability, leading to what we term "surround refinement degradation". To this end, our study presents a weak-to-strong eliciting framework aimed at enhancing surround refinement while maintaining robust monocular perception. Specifically, our framework employs weakly tuned experts trained on distinct subsets, and each is inherently biased toward specific camera configurations and scenarios. These biased experts can learn the perception of monocular degeneration, which can help the multi-view fusion stage to enhance surround refinement abilities. Moreover, a composite distillation strategy is proposed to integrate the universal knowledge of 2D foundation models and task-specific information. Finally, for MC3D-Det joint training, the elaborate dataset merge strategy is designed to solve the problem of inconsistent camera numbers and camera parameters. We set up a multiple dataset joint training benchmark for MC3D-Det and adequately evaluated existing methods. Further, we demonstrate the proposed framework brings a generalized and significant boost over multiple baselines. Our code is at \url{https://github.com/EnVision-Research/Scale-BEV}. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06265 [pdf, other]

Spatial-Temporal Multi-level Association for Video Object Segmentation

Authors: Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

Abstract: Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal m… ▽ More Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal multi-level association framework, which jointly associates reference frame, test frame, and object features to achieve sufficient interaction and parallel target ID association with a spatial-temporal memory bank for efficient video object segmentation. Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features, which formulates feature extraction and interaction as the efficient operations of object self-attention, reference object enhancement, and test reference correlation. In addition, we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation. We evaluate the proposed method by conducting extensive experiments on numerous video object segmentation datasets, including DAVIS 2016/2017 val, DAVIS 2017 test-dev, and YouTube-VOS 2018/2019 val. The favorable performance against the state-of-the-art methods demonstrates the effectiveness of our approach. All source code and trained models will be made publicly available. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06114 [pdf, other]

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xi** Hu

Abstract: With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resou… ▽ More With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resources. Due to intensive synchronization of models and sharing of data across GPUs and computing nodes during distributed training and inference processes, communication efficiency becomes the bottleneck for achieving high performance at a large scale. This article surveys the literature over the period of 2018-2023 on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning at various levels, including algorithms, frameworks, and infrastructures. Specifically, we first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training. Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference. After that, we present the latest technologies pertaining to modern communication infrastructures used in distributed deep learning with a focus on examining the impact of the communication overhead in a large-scale and heterogeneous setting. Finally, we conduct a case study on the distributed training of large language models at a large scale to illustrate how to apply these technologies in real cases. This article aims to offer researchers a comprehensive understanding of the current landscape of large-scale distributed deep learning and to reveal promising future research directions toward communication-efficient solutions in this scope. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06012 [pdf, other]

Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data

Authors: Kai Luan, Chenghao Shi, Neng Wang, Yuwei Cheng, Huimin Lu, Xieyuanli Chen

Abstract: The millimeter-wave radar sensor maintains stable performance under adverse environmental conditions, making it a promising solution for all-weather perception tasks, such as outdoor mobile robotics. However, the radar point clouds are relatively sparse and contain massive ghost points, which greatly limits the development of mmWave radar technology. In this paper, we propose a novel point cloud s… ▽ More The millimeter-wave radar sensor maintains stable performance under adverse environmental conditions, making it a promising solution for all-weather perception tasks, such as outdoor mobile robotics. However, the radar point clouds are relatively sparse and contain massive ghost points, which greatly limits the development of mmWave radar technology. In this paper, we propose a novel point cloud super-resolution approach for 3D mmWave radar data, named Radar-diffusion. Our approach employs the diffusion model defined by mean-reverting stochastic differential equations(SDE). Using our proposed new objective function with supervision from corresponding LiDAR point clouds, our approach efficiently handles radar ghost points and enhances the sparse mmWave radar point clouds to dense LiDAR-like point clouds. We evaluate our approach on two different datasets, and the experimental results show that our method outperforms the state-of-the-art baseline methods in 3D radar super-resolution tasks. Furthermore, we demonstrate that our enhanced radar point cloud is capable of downstream radar point-based registration tasks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Journal ref: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2024

arXiv:2404.05973 [pdf, ps, other]

Search for the Rare Decays $D_s^+\to h^+(h^{0})e^+e^-$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (618 additional authors not shown)

Abstract: Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay… ▽ More Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay $D_s^+\toπ^+φ,φ\to e^{+}e^{-}$ is observed with a statistical significance of 7.8$σ$, and evidence for the decay $D_s^+\toρ^+φ,φ\to e^{+}e^{-}$ is found for the first time with a statistical significance of 4.4$σ$. The decay branching fractions are measured to be $\mathcal{B}(D_s^+\toπ^+φ, φ\to e^{+}e^{-} )=(1.17^{+0.23}_{-0.21}\pm0.03)\times 10^{-5}$, and $\mathcal{B}(D_s^+\toρ^+φ, φ\to e^{+}e^{-} )=(2.44^{+0.67}_{-0.62}\pm 0.16)\times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No significant signal for the three four-body decays of $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-},\ D_{s}^{+}\to K^{+}π^{0}e^{+}e^{-}$, and $D_{s}^{+}\to K_{S}^{0}π^{+}e^{+}e^{-}$ is observed. For $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-}$, the $φ$ mass region is vetoed to minimize the long-distance effects. The 90$\%$ confidence level upper limits set on the branching fractions of these decays are in the range of $(7.0-8.1)\times 10^{-5}$. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures, 1 table

arXiv:2404.05781 [pdf, other]

Group-specific discriminant analysis reveals statistically validated sex differences in lateralization of brain functional network

Authors: Shuo Zhou, Junhao Luo, Yaya Jiang, Haolin Wang, Hai** Lu, Gaolang Gong

Abstract: Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralizat… ▽ More Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralization of functional networks as a dual-classification problem, consisting of first-order classification for left vs. right functional networks and second-order classification for male vs. female models. To capture sex-specific patterns, we develop the Group-Specific Discriminant Analysis (GSDA) for first-order classification. The evaluation on two public neuroimaging datasets demonstrates the efficacy of GSDA in learning sex-specific models from functional networks, achieving a significant improvement in group specificity over baseline methods. The major sex differences are in the strength of lateralization and the interactions within and between lobes. The GSDA-based method is generic in nature and can be adapted to other group-specific analyses such as handedness-specific or disease-specific analyses. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04996 [pdf, other]

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Authors: **** Zhang, Tianyu Yan, Yang Liu, Huchuan Lu

Abstract: As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with nat… ▽ More As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (C$^3$P) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 as Poster(Highlight)

arXiv:2404.04940 [pdf, other]

Fuzzy K-Means Clustering without Cluster Centroids

Authors: Han Lu, Fangfang Li, Quanxue Gao, Cheng Deng, Chris Ding, Qianqian Wang

Abstract: Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on… ▽ More Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on cluster centroids, obtaining membership matrices solely through distance matrix computation. This innovation enhances flexibility in distance measurement between sample points, thus improving the algorithm's performance and robustness. The paper also establishes theoretical connections between the proposed model and popular Fuzzy K-Means clustering techniques. Experimental results on several real datasets demonstrate the effectiveness of the algorithm. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04917 [pdf, ps, other]

Search for $η_c(2S)\to 2(π^+π^-)$ and improved measurement of $χ_{cJ}\to 2(π^+π^-)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level… ▽ More We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level. Using $ψ(3686)\toγχ_{cJ}$ transitions, we also measure the branching fractions of $\mathcal{B}[χ_{cJ(J=0,1,2)}\to 2(π^+π^-)]$, which are $\mathcal{B}[χ_{c0}\to 2(π^+π^-)]=(2.127\pm 0.002~(\mathrm{stat.})\pm 0.101~(\mathrm{syst.}))$\%, $\mathcal{B}[χ_{c1}\to 2(π^+π^-)]=(0.685\pm 0.001~(\mathrm{stat.})\pm 0.031~\mathrm{syst.}))$\%, and $\mathcal{B}[χ_{c2}\to 2(π^+π^-)]=(1.153\pm 0.001~(\mathrm{stat.})\pm 0.063~(\mathrm{syst.}))$\%. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04797 [pdf, ps, other]

A characterization on $(g,f)$-parity orientations

Authors: Hongliang Lu, Xinxin Ma

Abstract: Let $G$ be a graph and $g,f:V(G)\to2^N$ be two set functions such that $g(v)\le f(v)$ and $g(v)\equiv f(v)\pmod 2$ for every $v\in V(G)$. An orientation $O$ of $G$ is called a $(g,f)$-parity orientation if $g(v)\le d^+_O(v)\le f(v)$ and $g(v)\equiv d^+_O(v)\pmod 2$ for every $v\in V(G)$. In this paper, we give a Tutte-type characterization for a graph to have a $(g,f)$-parity orientation. Let $G$ be a graph and $g,f:V(G)\to2^N$ be two set functions such that $g(v)\le f(v)$ and $g(v)\equiv f(v)\pmod 2$ for every $v\in V(G)$. An orientation $O$ of $G$ is called a $(g,f)$-parity orientation if $g(v)\le d^+_O(v)\le f(v)$ and $g(v)\equiv d^+_O(v)\pmod 2$ for every $v\in V(G)$. In this paper, we give a Tutte-type characterization for a graph to have a $(g,f)$-parity orientation. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04718 [pdf, other]

Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

Authors: Prasun C Tripathi, Sina Tabakhi, Mohammod N I Suvon, Lawrence Schöb, Samer Alabed, Andrew J Swift, Shuo Zhou, Hai** Lu

Abstract: Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PA… ▽ More Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PAWP marker. We utilize complementary information from Cardiac Magnetic Resonance Imaging (CMR) scans (short-axis and four-chamber) and Electronic Health Records (EHRs). We extract spatio-temporal features from CMR scans using tensor-based learning. We propose a graph attention network to select important EHR features for prediction, where we model subjects as graph nodes and feature relationships as graph edges using the attention mechanism. We design four feature fusion strategies: early, intermediate, late, and hybrid fusion. With a linear classifier and linear fusion strategies, our pipeline is interpretable. We validate our pipeline on a large dataset of $2,641$ subjects from our ASPIRE registry. The comparative study against state-of-the-art methods confirms the superiority of our pipeline. The decision curve analysis further validates that our pipeline can be applied to screen a large population. The code is available at https://github.com/prasunc/hemodynamics. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04640 [pdf, other]

Search for di-photon decays of an axion-like particle in radiative J/ψdecays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (604 additional authors not shown)

Abstract: We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative $J/ψ$ decays, using 10 billion $J/ψ$ events collected with the BESIII detector. We find no evidence of a signal and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψ\to γa) \times \mathcal{B}(a \to γγ)$ and the axion-like particle photon coupling constan… ▽ More We search for the di-photon decay of a light pseudoscalar axion-like particle, $a$, in radiative $J/ψ$ decays, using 10 billion $J/ψ$ events collected with the BESIII detector. We find no evidence of a signal and set upper limits at the $95\%$ confidence level on the product branching fraction $\mathcal{B}(J/ψ\to γa) \times \mathcal{B}(a \to γγ)$ and the axion-like particle photon coupling constant $g_{a γγ}$ in the ranges of $(3.7-48.5) \times 10^{-8}$ and $(2.2 -101.8)\times 10^{-4}$ GeV$^{-1}$, respectively, for $0.18 \le m_a \le 2.85$ GeV/$c^2$. These are the most stringent limits to date in this mass region. △ Less

Submitted 3 July, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures, To be published in Phys. Rev. D (Letter)

Report number: BESIII Analysis Memo - 671

arXiv:2404.03286 [pdf, other]

A complex node of the cosmic web associated with the massive galaxy cluster MACS J0600.1-2008

Authors: Lukas J. Furtak, Adi Zitrin, Johan P. Richard, Dominique Eckert, Jack Sayers, Harald Ebeling, Seiji Fujimoto, Nicolas Laporte, David Lagattuta, Marceau Limousin, Guillaume Mahler, Ashish K. Meena, Felipe Andrade-Santos, Brenda L. Frye, Anton M. Koekemoer, Kotaro Kohno, Daniel Espada, Harry Lu, Richard Massey, Anna Niemiec

Abstract: MACS J0600.1-2008 (MACS0600) is an X-ray luminous, massive galaxy cluster at $z_{\mathrm{d}}=0.43$, studied previously as part of the REionization LensIng Cluster Survey (RELICS) and ALMA Lensing Cluster Survey (ALCS) projects which revealed a complex, bimodal mass distribution and an intriguing high-redshift object behind it. Here, we report on the results of an extended strong-lensing (SL) analy… ▽ More MACS J0600.1-2008 (MACS0600) is an X-ray luminous, massive galaxy cluster at $z_{\mathrm{d}}=0.43$, studied previously as part of the REionization LensIng Cluster Survey (RELICS) and ALMA Lensing Cluster Survey (ALCS) projects which revealed a complex, bimodal mass distribution and an intriguing high-redshift object behind it. Here, we report on the results of an extended strong-lensing (SL) analysis of this system. Using new JWST and ground-based Gemini-N and Keck data, we obtain 13 new spectroscopic redshifts of multiply imaged galaxies and identify 12 new photometric multiple-image systems and candidates, including two multiply imaged $z\sim7$ objects. Taking advantage of the larger areal coverage, our analysis reveals a new bimodal, massive SL structure adjacent to the cluster which we measure spectroscopically to lie at the same redshift and whose existence was implied by previous SL-modeling analyses. While based in part on photometric systems identified in ground-based imaging requiring further verification, our extended SL model suggests that the cluster may have the second-largest critical area and effective Einstein radius observed to date, $A_{\mathrm{crit}}\simeq2.16\,\mathrm{arcmin}^2$ and $θ_{\mathrm{E}}=49.7''\pm5.0''$ for a source at $z_{\mathrm{s}}=2$, enclosing a total mass of $M(<θ_{\mathrm{E}})=(4.7\pm0.7)\times10^{14}\,\mathrm{M}_{\odot}$. Yet another, probably related massive cluster structure, discovered in X-rays $5'$ (1.7 Mpc) further north, suggests that MACS0600 is in fact part of an even larger filamentary structure. This discovery adds to several recent detections of massive structures around SL galaxy clusters and establishes MACS0600 as a prime target for future high-redshift surveys with JWST. △ Less

Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Submitted to MNRAS. Comments welcome!

arXiv:2404.03217 [pdf, other]

Evidence of the $h_c\to K_S^0 K^+π^-+c.c.$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Based on $(2.712\pm0.014)\times10^9$ $ψ(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+π^-+c.c.$ is found with a significance of $4.3σ$ in the $ψ(3686)\toπ^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+π^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systemat… ▽ More Based on $(2.712\pm0.014)\times10^9$ $ψ(3686)$ events collected by the BESIII collaboration, evidence of the hadronic decay $h_c\to K_S^0K^+π^-+c.c.$ is found with a significance of $4.3σ$ in the $ψ(3686)\toπ^0 h_c$ process. The branching fraction of $h_c\to K_S^0 K^+π^- +c.c.$ is measured to be $(7.3\pm0.8\pm1.8)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. Combining with the exclusive decay width of $η_c\to K\bar{K}π$, our result indicates inconsistencies with both pQCD and NRQCD predictions. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02977 [pdf, other]

Superradiant Instability of Charged Extremal Black Holes in Einstein-Born-Infeld Gravity

Authors: Ze-Hua Wu, H. Lu

Abstract: We study charged scalar perturbations of charged extremal black holes in Einstein-Born-Infeld theory. Our numerical results indicate that these black holes all suffer from superradiant instability by the unstable quasi-bound states, regardless how small the coupling constant is. We therefore provide a new example that the superradiant stability of the Reissner-Nordström black hole is a fine-tuned… ▽ More We study charged scalar perturbations of charged extremal black holes in Einstein-Born-Infeld theory. Our numerical results indicate that these black holes all suffer from superradiant instability by the unstable quasi-bound states, regardless how small the coupling constant is. We therefore provide a new example that the superradiant stability of the Reissner-Nordström black hole is a fine-tuned result, as in the case when it is embedded in the STU supergravity model. The work is also motivated by the weak gravity conjecture since at the linear coupling constant level, the theory belongs to a subsect of four-derivative corrections in the effective field theory. Our results appear to support the notion that the black holes do decay when gravity is weaker by the correction, but the decaying halftime requires nonlinear effects and cannot be seen at the level of linear coupling constant. The full nonlinear effects also indicate that the black holes can decay even when gravity is stronger. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Latex, 23 pages, 4 figures with 7 panels

arXiv:2404.02544 [pdf, other]

Semi-Supervised Unconstrained Head Pose Estimation in the Wild

Authors: Huayi Zhou, Fei Jiang, Hongtao Lu

Abstract: Existing head pose estimation datasets are either composed of numerous samples by non-realistic synthesis or lab collection, or limited images by labor-intensive annotating. This makes deep supervised learning based solutions compromised due to the reliance on generous labeled data. To alleviate it, we propose the first semi-supervised unconstrained head pose estimation (SemiUHPE) method, which ca… ▽ More Existing head pose estimation datasets are either composed of numerous samples by non-realistic synthesis or lab collection, or limited images by labor-intensive annotating. This makes deep supervised learning based solutions compromised due to the reliance on generous labeled data. To alleviate it, we propose the first semi-supervised unconstrained head pose estimation (SemiUHPE) method, which can leverage a large amount of unlabeled wild head images. Specifically, we follow the recent semi-supervised rotation regression, and focus on the diverse and complex head pose domain. Firstly, we claim that the aspect-ratio invariant crop** of heads is superior to the previous landmark-based affine alignment, which does not fit unlabeled natural heads or practical applications where landmarks are often unavailable. Then, instead of using an empirically fixed threshold to filter out pseudo labels, we propose the dynamic entropy-based filtering by updating thresholds for adaptively removing unlabeled outliers. Moreover, we revisit the design of weak-strong augmentations, and further exploit its superiority by devising two novel head-oriented strong augmentations named pose-irrelevant cut-occlusion and pose-altering rotation consistency. Extensive experiments show that SemiUHPE can surpass SOTAs with remarkable improvements on public benchmarks under both front-range and full-range. Our code is released in \url{https://github.com/hnuzhy/SemiUHPE}. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 14 pages. Semi-Supervised Unconstrained Head Pose Estimation

arXiv:2404.02427 [pdf]

doi 10.1063/5.0097518

In-situ tunable giant electrical anisotropy in a grating gated AlGaN/GaN two-dimensional electron gas

Authors: Ting-Ting Wang, Sining Dong, Chong Li, Wen-Cheng Yue, Yang-Yang Lyu, Chen-Guang Wang, Chang-Kun Zeng, Zixiong Yuan, Wei Zhu, Zhi-Li Xiao, Xiaoli Lu, Bin Liu, Hai Lu, Hua-Bing Wang, Peiheng Wu, Wai-Kwong Kwok, Yong-Lei Wang

Abstract: Materials with in-plane electrical anisotropy have great potential for designing artificial synaptic devices. However, natural materials with strong intrinsic in-plane electrical anisotropy are rare. We introduce a simple strategy to produce extremely large electrical anisotropy via grating gating of a semiconductor two-dimensional electron gas (2DEG) of AlGaN/GaN. We show that periodically modula… ▽ More Materials with in-plane electrical anisotropy have great potential for designing artificial synaptic devices. However, natural materials with strong intrinsic in-plane electrical anisotropy are rare. We introduce a simple strategy to produce extremely large electrical anisotropy via grating gating of a semiconductor two-dimensional electron gas (2DEG) of AlGaN/GaN. We show that periodically modulated electric potential in the 2DEG induces in-plane electrical anisotropy, which is significantly enhanced in a magnetic field, leading to an ultra large electrical anisotropy. This is induced by a giant positive magnetoresistance and a giant negative magnetoresistance under two orthogonally oriented in-plane current flows, respectively. This giant electrical anisotropy is in-situ tunable by tailoring both the grating gate voltage and the magnetic field. Our semiconductor device with controllable giant electrical anisotropy will stimulate new device applications, such as multi-terminal memtransistors and bionic synapses. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Journal ref: Appl. Phys. Lett. 121, 092101 (2022)

arXiv:2404.02227 [pdf, other]

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Authors: Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu

Abstract: Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obs… ▽ More Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research. The code is available at \url{https://github.com/Hai-chao-Zhang/OOSTraj}. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR)

arXiv:2404.02033 [pdf, other]

Search for $C$-even states decaying to $D_{s}^{\pm}D_{s}^{*\mp}$ with masses between $4.08$ and $4.32$ $\rm GeV/{\it c}^{2}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically s… ▽ More Six $C$-even states, denoted as $X$, with quantum numbers $J^{PC}=0^{-+}$, $1^{\pm+}$, or $2^{\pm+}$, are searched for via the $e^+e^-\toγD_{s}^{\pm}D_{s}^{*\mp}$ process using $(1667.39\pm8.84)~\mathrm{pb}^{-1}$ of $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII storage ring at center-of-mass energy of $\sqrt{s}=(4681.92\pm0.30)~\mathrm{MeV}$. No statistically significant signal is observed in the mass range from $4.08$ to $4.32~\mathrm{GeV}/c^{2}$. The upper limits of $σ[e^+e^-\toγX]\cdot \mathcal{B}[X \to D_{s}^{\pm}D_{s}^{*\mp}]$ at a $90\%$ confidence level are determined. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01687 [pdf, other]

Search for a sub-eV sterile neutrino using Daya Bay's full dataset

Authors: F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding, Y. Y. Ding , et al. (176 additional authors not shown)

Abstract: This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis… ▽ More This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2404.01587 [pdf, other]

TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation

Authors: Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen

Abstract: Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large ne… ▽ More Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large networks, leading to significant consumption of computational resources. In this paper, we propose a high-performance teacher and lightweight student distillation framework called TSCM. It exploits our devised cross-metric knowledge distillation to narrow the performance gap between the teacher and student models, maintaining superior performance while enabling minimal computational load during deployment. We conduct comprehensive evaluations on large-scale datasets, namely Pittsburgh30k and Pittsburgh250k. Experimental results demonstrate the superiority of our method over baseline models in terms of recognition accuracy and model parameter efficiency. Moreover, our ablation studies show that the proposed knowledge distillation technique surpasses other counterparts. The code of our method has been released at https://github.com/nubot-nudt/TSCM. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00327 [pdf, other]

YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT)

Authors: Wen Sheng, Zhong Zheng, Jiajun Liu, Han Lu, Hanyuan Zhang, Zhengyong Jiang, Zhihong Zhang, Dao** Zhu

Abstract: Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation d… ▽ More Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation datasets was assembled and annotated. Concurrently, we utilized Dice coefficient as the metric for assessing the segmentation outcomes produced by YNetr, having advantage of capturing different frequency information. Results: The YNetr model achieved a Dice coefficient of 62.63% on the PSLT dataset, surpassing the other publicly available model by an accuracy margin of 1.22%. Comparative evaluations were conducted against a range of models including UNet 3+, XNet, UNetr, Swin UNetr, Trans-BTS, COTr, nnUNetv2 (2D), nnUNetv2 (3D fullres), MedNext (2D) and MedNext(3D fullres). Conclusions: We not only proposed a dataset named PSLT(Plain Scan Liver Tumors), but also explored a structure called YNetr that utilizes wavelet transform to extract different frequency information, which having the SOTA in PSLT by experiments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2403.19256 [pdf, other]

Measurement of absolute branching fractions of $D_s^+$ hadronic decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (632 additional authors not shown)

Abstract: Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions… ▽ More Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions $\mathcal{B}(D_s^+ \to K^+ K^- π^+)=(5.49 \pm 0.04 \pm 0.07)\%$, $\mathcal{B}(D_s^+ \to K_S^0 K^+)=(1.50 \pm 0.01 \pm 0.01)\%$ and $\mathcal{B}(D_s^+ \to K^+ K^- π^+ π^0)=(5.50 \pm 0.05 \pm 0.11)\%$, where the first uncertainties are statistical and the second ones are systematic. The \emph{CP} asymmetries in these decays are also measured and all are found to be compatible with zero. △ Less

Submitted 30 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19213 [pdf, other]

Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting

Authors: Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, **gyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao Lu

Abstract: Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes su… ▽ More Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes such as shadows, as well as suffer from interference of background lines or textures. To address these challenges, in this paper, we propose a novel auxiliary learning framework for mask-guided matting models, incorporating three auxiliary tasks: semantic segmentation, edge detection, and background line detection besides matting, to learn different and effective representations from different types of data and annotations. Our framework and model introduce the following key aspects: (1) to learn real-world adaptive semantic representation for objects with diverse and complex structures under real-world scenes, we introduce extra semantic segmentation and edge detection tasks on more diverse real-world data with segmentation annotations; (2) to avoid overfitting on low-level details, we propose a module to utilize the inconsistency between learned segmentation and matting representations to regularize detail refinement; (3) we propose a novel background line detection task into our auxiliary learning framework, to suppress interference of background lines or textures. In addition, we propose a high-quality matting benchmark, Plant-Mat, to evaluate matting methods on complex structures. Extensively quantitative and qualitative results show that our approach outperforms state-of-the-art mask-guided methods. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19091 [pdf, other]

Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (600 additional authors not shown)

Abstract: By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra… ▽ More By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 19pages

arXiv:2403.18886 [pdf, other]

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Authors: Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong

Abstract: Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or promp… ▽ More Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal. △ Less

Submitted 9 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18383 [pdf, other]

Generative Multi-modal Models are Good Class-Incremental Learners

Authors: Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng

Abstract: In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. H… ▽ More In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. However, transitioning from discriminative to generative models requires addressing two key challenges. The primary challenge lies in transferring the generated textual information into the classification of distinct categories. Additionally, it requires formulating the task of CIL within a generative framework. To this end, we propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. After obtaining the detailed text, we use a text encoder to extract text features and employ feature matching to determine the most similar label as the classification prediction. In the conventional CIL settings, we achieve significantly better results in long-sequence task scenarios. Under the Few-shot CIL setting, we have improved by at least 14\% accuracy over all the current state-of-the-art methods with significantly less forgetting. Our code is available at \url{https://github.com/DoubleClass/GMM}. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.17651 [pdf, other]

Exploring Dynamic Transformer for Efficient Object Tracking

Authors: Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu

Abstract: The speed-precision trade-off is a critical problem for visual object tracking which usually requires low latency and deployment on constrained resources. Existing solutions for efficient tracking mainly focus on adopting light-weight backbones or modules, which nevertheless come at the cost of a sacrifice in precision. In this paper, inspired by dynamic network routing, we propose DyTrack, a dyna… ▽ More The speed-precision trade-off is a critical problem for visual object tracking which usually requires low latency and deployment on constrained resources. Existing solutions for efficient tracking mainly focus on adopting light-weight backbones or modules, which nevertheless come at the cost of a sacrifice in precision. In this paper, inspired by dynamic network routing, we propose DyTrack, a dynamic transformer framework for efficient tracking. Real-world tracking scenarios exhibit diverse levels of complexity. We argue that a simple network is sufficient for easy frames in video sequences, while more computation could be assigned to difficult ones. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Thus, it can achieve higher performance with the same running speed. We formulate instance-specific tracking as a sequential decision problem and attach terminating branches to intermediate layers of the entire model. Especially, to fully utilize the computations, we introduce the feature recycling mechanism to reuse the outputs of predecessors. Furthermore, a target-aware self-distillation strategy is designed to enhance the discriminating capabilities of early predictions by effectively mimicking the representation pattern of the deep model. Extensive experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model. For instance, DyTrack obtains 64.9% AUC on LaSOT with a speed of 256 fps. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16811 [pdf, ps, other]

Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en… ▽ More The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16128 [pdf, other]

Enhancing Video Transformers for Action Understanding with VLM-aided Training

Authors: Hui Lu, Hu Jian, Ronald Poppe, Albert Ali Salah

Abstract: Owing to their ability to extract relevant spatio-temporal video embeddings, Vision Transformers (ViTs) are currently the best performing models in video action understanding. However, their generalization over domains or datasets is somewhat limited. In contrast, Visual Language Models (VLMs) have demonstrated exceptional generalization performance, but are currently unable to process videos. Con… ▽ More Owing to their ability to extract relevant spatio-temporal video embeddings, Vision Transformers (ViTs) are currently the best performing models in video action understanding. However, their generalization over domains or datasets is somewhat limited. In contrast, Visual Language Models (VLMs) have demonstrated exceptional generalization performance, but are currently unable to process videos. Consequently, they cannot extract spatio-temporal patterns that are crucial for action understanding. In this paper, we propose the Four-tiered Prompts (FTP) framework that takes advantage of the complementary strengths of ViTs and VLMs. We retain ViTs' strong spatio-temporal representation ability but improve the visual encodings to be more comprehensive and general by aligning them with VLM outputs. The FTP framework adds four feature processors that focus on specific aspects of human action in videos: action category, action components, action description, and context information. The VLMs are only employed during training, and inference incurs a minimal computation cost. Our approach consistently yields state-of-the-art performance. For instance, we achieve remarkable top-1 accuracy of 93.8% on Kinetics-400 and 83.4% on Something-Something V2, surpassing VideoMAEv2 by 2.8% and 2.6%, respectively. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15789 [pdf, other]

In-Context Matting

Authors: He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu

Abstract: We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting an… ▽ More We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting and ease of use in automatic matting, which finds a good trade-off between customization and automation. To overcome the key challenge of accurate foreground matching, we introduce IconMatting, an in-context matting model built upon a pre-trained text-to-image diffusion model. Conditioned on inter- and intra-similarity matching, IconMatting can make full use of reference context to generate accurate target alpha mattes. To benchmark the task, we also introduce a novel testing dataset ICM-$57$, covering 57 groups of real-world images. Quantitative and qualitative results on the ICM-57 testing set show that IconMatting rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting. Code is available at https://github.com/tiny-smart/in-context-matting △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. Code is available at https://github.com/tiny-smart/in-context-matting

arXiv:2403.14998 [pdf, other]

Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel… ▽ More Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures, published to PRL

arXiv:2403.14862 [pdf, other]

The Power of Linear Programming in Sponsored Listings Ranking: Evidence from Field Experiments

Authors: Haihao Lu, Luyang Zhang

Abstract: Sponsored listing is one of the major revenue sources for many prominent online marketplaces, such as Amazon, Walmart, and Alibaba. When consumers visit a marketplace's webpage for a specific item, in addition to that item, the marketplace might also display a ranked listing of sponsored items from various third-party sellers. These sellers are charged an advertisement fee if a user purchases any… ▽ More Sponsored listing is one of the major revenue sources for many prominent online marketplaces, such as Amazon, Walmart, and Alibaba. When consumers visit a marketplace's webpage for a specific item, in addition to that item, the marketplace might also display a ranked listing of sponsored items from various third-party sellers. These sellers are charged an advertisement fee if a user purchases any of the sponsored items from this listing. Determining how to rank these sponsored items for each incoming visit is a crucial challenge for online marketplaces, a problem known as sponsored listings ranking (SLR). The major difficulty of SLR lies in balancing the trade-off between maximizing the overall revenue and recommending high-quality and relevant ranked listings. While a more relevant ranking may result in more purchases and consumer engagement, the marketplace also needs to take account of the potential revenue when making ranking decisions. Due to the latency requirement and historical reasons, many online marketplaces use score-based ranking algorithms for SLR optimization. Alternatively, recent research also discusses obtaining the ranking by solving linear programming (LP). In this paper, we collaborate with a leading online global marketplace and conduct a series of field experiments to compare the performance of the score-based ranking algorithms and the LP-based algorithms. The field experiment lasted for $19$ days, which included $329.3$ million visits in total. We observed that the LP-based approach improved all major metrics by $1.80\%$ of revenue, $1.55\%$ of purchase, and $1.39\%$ of the gross merchandise value (GMV), compared to an extremely-tuned score-based algorithm that was previously used in production by the marketplace. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14535 [pdf, other]

First-Order Methods for Linear Programming

Authors: Haihao Lu

Abstract: Linear programming is the seminal optimization problem that has spawned and grown into today's rich and diverse optimization modeling and algorithmic landscape. This article provides an overview of the recent development of first-order methods for solving large-scale linear programming. Linear programming is the seminal optimization problem that has spawned and grown into today's rich and diverse optimization modeling and algorithmic landscape. This article provides an overview of the recent development of first-order methods for solving large-scale linear programming. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13658 [pdf, other]

Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection

Authors: Mohammod N. I. Suvon, Prasun C. Tripathi, Wenrui Fan, Shuo Zhou, Xianyuan Liu, Samer Alabed, Venet Osmani, Andrew J. Swift, Chen Chen, Hai** Lu

Abstract: Recent advancements in non-invasive detection of cardiac hemodynamic instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a… ▽ More Recent advancements in non-invasive detection of cardiac hemodynamic instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a few studies have explored multimodal methods to study CHDI, which mostly rely on costly modalities such as cardiac MRI and echocardiogram. In response to these limitations, we propose a novel multimodal variational autoencoder ($\text{CardioVAE}_\text{X,G}$) to integrate low-cost chest X-ray (CXR) and electrocardiogram (ECG) modalities with pre-training on a large unlabeled dataset. Specifically, $\text{CardioVAE}_\text{X,G}$ introduces a novel tri-stream pre-training strategy to learn both shared and modality-specific features, thus enabling fine-tuning with both unimodal and multimodal datasets. We pre-train $\text{CardioVAE}_\text{X,G}$ on a large, unlabeled dataset of $50,982$ subjects from a subset of MIMIC database and then fine-tune the pre-trained model on a labeled dataset of $795$ subjects from the ASPIRE registry. Comprehensive evaluations against existing methods show that $\text{CardioVAE}_\text{X,G}$ offers promising performance (AUROC $=0.79$ and Accuracy $=0.77$), representing a significant step forward in non-invasive prediction of CHDI. Our model also excels in producing fine interpretations of predictions directly associated with clinical features, thereby supporting clinical decision-making. △ Less

Submitted 20 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13437 [pdf, other]

Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be… ▽ More Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level. △ Less

Submitted 14 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12408 [pdf, other]

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

Authors: Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

Abstract: There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserve… ▽ More There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserved. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Showing 101–150 of 3,138 results for author: Lu, H