Search | arXiv e-print repository

arXiv:2403.10921 [pdf, ps, other]

Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surfaces Empowered Cooperative Rate Splitting with User Relaying

Authors: Kangchun Zhao, Yijie Mao, Yuanming Shi

Abstract: In this work, we unveil the advantages of synergizing cooperative rate splitting (CRS) with user relaying and simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR RIS). Specifically, we propose a novel STAR RIS-assisted CRS transmission framework, featuring six unique transmission modes that leverage various combination of the relaying protocols (including full duple… ▽ More In this work, we unveil the advantages of synergizing cooperative rate splitting (CRS) with user relaying and simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR RIS). Specifically, we propose a novel STAR RIS-assisted CRS transmission framework, featuring six unique transmission modes that leverage various combination of the relaying protocols (including full duplex-FD and half duplex-HD) and the STAR RIS configuration protocols (including energy splitting-ES, mode switching-MS, and time splitting-TS). With the objective of maximizing the minimum user rate, we then propose a unified successive convex approximation (SCA)-based alternative optimization (AO) algorithm to jointly optimize the transmit active beamforming, common rate allocation, STAR RIS passive beamforming, as well as time allocation (for HD or TS protocols) subject to the transmit power constraint at the base station (BS) and the law of energy conservation at the STAR RIS. To alleviate the computational burden, we further propose a low-complexity algorithm that incorporates a closed-form passive beamforming design. Numerical results show that our proposed framework significantly enhances user fairness compared with conventional CRS schemes without STAR RIS or other STAR RIS empowered multiple access schemes. Moreover, the proposed low-complexity algorithm dramatically reduces the computational complexity while achieving very close performance to the AO method. △ Less

Submitted 13 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10877 [pdf, ps, other]

Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a… ▽ More We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages, 3 figures

arXiv:2403.10521 [pdf, other]

P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

Authors: Zhou Jiang, Zhenxin Zhu, Pengfei Li, Huan-ang Gao, Tianyuan Yuan, Yongliang Shi, Hang Zhao, Hao Zhao

Abstract: Autonomous vehicles are gradually entering city roads today, with the help of high-definition maps (HDMaps). However, the reliance on HDMaps prevents autonomous vehicles from step** into regions without this expensive digital infrastructure. This fact drives many researchers to study online HDMap generation algorithms, but the performance of these algorithms at far regions is still unsatisfying.… ▽ More Autonomous vehicles are gradually entering city roads today, with the help of high-definition maps (HDMaps). However, the reliance on HDMaps prevents autonomous vehicles from step** into regions without this expensive digital infrastructure. This fact drives many researchers to study online HDMap generation algorithms, but the performance of these algorithms at far regions is still unsatisfying. We present P-MapNet, in which the letter P highlights the fact that we focus on incorporating map priors to improve model performance. Specifically, we exploit priors in both SDMap and HDMap. On one hand, we extract weakly aligned SDMap from OpenStreetMap, and encode it as an additional conditioning branch. Despite the misalignment challenge, our attention-based architecture adaptively attends to relevant SDMap skeletons and significantly improves performance. On the other hand, we exploit a masked autoencoder to capture the prior distribution of HDMap, which can serve as a refinement module to mitigate occlusions and artifacts. We benchmark on the nuScenes and Argoverse2 datasets. Through comprehensive experiments, we show that: (1) our SDMap prior can improve online map generation performance, using both rasterized (by up to $+18.73$ $\rm mIoU$) and vectorized (by up to $+8.50$ $\rm mAP$) output representations. (2) our HDMap prior can improve map perceptual metrics by up to $6.34\%$. (3) P-MapNet can be switched into different inference modes that covers different regions of the accuracy-efficiency trade-off landscape. (4) P-MapNet is a far-seeing solution that brings larger improvements on longer ranges. Codes and models are publicly available at https://jike5.github.io/P-MapNet. △ Less

Submitted 29 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Code: https://jike5.github.io/P-MapNet

arXiv:2403.10060 [pdf, ps, other]

Global rigidity of smooth ${\mathbb Z}\ltimes_λ{\mathbb R}$-actions on ${\mathbb T}^2$

Authors: Changguang Dong, Yi Shi

Abstract: For $λ>1$, we consider the locally free ${\mathbb Z}\ltimes_λ{\mathbb R}$ actions on ${\mathbb T}^2$. We show that, if the action is $C^r$ with $r\geq2$, then it is $C^{r-ε}$-conjugate to an affine action generated by a hyperbolic automorphism and a linear translation flow along expanding eigen-direction of the automorphism. In contrast, there exists a $C^{1+α}$-action which is semi-conjugate, but… ▽ More For $λ>1$, we consider the locally free ${\mathbb Z}\ltimes_λ{\mathbb R}$ actions on ${\mathbb T}^2$. We show that, if the action is $C^r$ with $r\geq2$, then it is $C^{r-ε}$-conjugate to an affine action generated by a hyperbolic automorphism and a linear translation flow along expanding eigen-direction of the automorphism. In contrast, there exists a $C^{1+α}$-action which is semi-conjugate, but not topologically conjugate to an affine action. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09174 [pdf, other]

Properties of a Fading AGN from SDSS-IV MaNGA

Authors: Hao Mo, Yan-Mei Chen, Zhi-Yun Zhang, Alexei Moiseev, Dmitry Bizyaev, Yong Shi, Qiu-Sheng Gu, Min Bao, Xiao Cao, Song-Lin Li

Abstract: We identify a fading AGN SDSS J220141.64+115124.3 from the internal Product Launch-11 (MPL-11) in Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The central region with a projected radius of $\sim$2.4 kpc is characterized as LINER-like line ratios while the outskirts extended to $\sim$15 kpc show Seyfert-like line ratios. The [OIII]$λ$5007 luminosity of the Seyfert regions is… ▽ More We identify a fading AGN SDSS J220141.64+115124.3 from the internal Product Launch-11 (MPL-11) in Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The central region with a projected radius of $\sim$2.4 kpc is characterized as LINER-like line ratios while the outskirts extended to $\sim$15 kpc show Seyfert-like line ratios. The [OIII]$λ$5007 luminosity of the Seyfert regions is a factor of 37 (2) higher than the LINER regions without (with) dust attenuation correction, suggesting that the AGN activity decreases at least $\sim$8 $\times$ 10$^3$ yrs ($\sim$2.4 kpc/light-speed) ago. We model the emission line spectra in the central region with double Gaussian components (a narrow core and a broad wing) and analyze the properties of each component. The narrow core component mostly co-rotates with the stellar disc, whereas the broad wing component with a median of the velocity dispersion $\sim$300 km s$^{-1}$ is related to a wind outflow. The kinematic position angle (PA) of the ionized gas shows a $\sim$20° twist from the galaxy center to 1.5 effective radius. The median of the PA difference between the gas and stellar components is as large as $\sim$50° within 0.4 effective radius. The tidal feature in DESI image and star-gas misalignment suggest this galaxy is a merger remnant. Combining all these observational results as well as public available X-ray and MIR luminosities, we confirm this is a fading AGN, the merger process kick-started the central engine to quasar phase which ionized gas composed of tidal debris, and now the activity of the central black hole decreases. The discontinuity in [OIII]$λ$5007 flux and EQW maps is due to multiple AGN outbursts triggered by merger remnant gas inflows. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted for publication in MNRAS. 12 pages, 10 figures, 1 table

arXiv:2403.09095 [pdf, other]

Exploring Hilbert-Space Fragmentation on a Superconducting Processor

Authors: Yong-Yi Wang, Yun-Hao Shi, Zheng-Hang Sun, Chi-Tong Chen, Zheng-An Wang, Kui Zhao, Hao-Tian Liu, Wei-Guo Ma, Ziting Wang, Hao Li, Jia-Chi Zhang, Yu Liu, Cheng-Lin Deng, Tian-Ming Li, Yang He, Zheng-He Liu, Zhen-Yu Peng, Xiaohui Song, Guangming Xue, Haifeng Yu, Kaixuan Huang, Zhongcheng Xiang, Dongning Zheng, Kai Xu, Heng Fan

Abstract: Isolated interacting quantum systems generally thermalize, yet there are several counterexamples for the breakdown of ergodicity, such as many-body localization and quantum scars. Recently, ergodicity breaking has been observed in systems subjected to linear potentials, termed Stark many-body localization. This phenomenon is closely associated with Hilbert-space fragmentation, characterized by a s… ▽ More Isolated interacting quantum systems generally thermalize, yet there are several counterexamples for the breakdown of ergodicity, such as many-body localization and quantum scars. Recently, ergodicity breaking has been observed in systems subjected to linear potentials, termed Stark many-body localization. This phenomenon is closely associated with Hilbert-space fragmentation, characterized by a strong dependence of dynamics on initial conditions. Here, we experimentally explore initial-state dependent dynamics using a ladder-type superconducting processor with up to 24 qubits, which enables precise control of the qubit frequency and initial state preparation. In systems with linear potentials, we observe distinct non-equilibrium dynamics for initial states with the same quantum numbers and energy, but with varying domain wall numbers. This distinction becomes increasingly pronounced as the system size grows, in contrast with disordered interacting systems. Our results provide convincing experimental evidence of the fragmentation in Stark systems, enriching our understanding of the weak breakdown of ergodicity. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: main text: 7 pages, 4 figures; supplementary: 13 pages, 14 figures

arXiv:2403.08996 [pdf, other]

Ventilation and Temperature Control for Energy-efficient and Healthy Buildings: A Differentiable PDE Approach

Authors: Yuexin Bian, Xiaohan Fu, Rajesh K. Gupta, Yuanyuan Shi

Abstract: In this paper, we introduce a novel framework for building learning and control, focusing on ventilation and thermal management to enhance energy efficiency. We validate the performance of the proposed framework in system model learning via two case studies: a synthetic study focusing on the joint learning of temperature and CO2 fields, and an application to a real-world dataset for CO2 field lear… ▽ More In this paper, we introduce a novel framework for building learning and control, focusing on ventilation and thermal management to enhance energy efficiency. We validate the performance of the proposed framework in system model learning via two case studies: a synthetic study focusing on the joint learning of temperature and CO2 fields, and an application to a real-world dataset for CO2 field learning. For building control, we demonstrate that the proposed framework can optimize the control actions and significantly reduce the energy cost while maintaining a comfort and healthy indoor environment. When compared to existing traditional methods, an optimization-based method with ODE models and reinforcement learning, our approach can significantly reduce the energy consumption while guarantees all the safety-critical air quality and control constraints. Promising future research directions involve validating and improving the proposed PDE models through accurate estimation of airflow fields within indoor environments. Additionally, incorporating uncertainty modeling into the PDE framework for HVAC control presents an opportunity to enhance the efficiency and reliability of building HVAC system management. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08946 [pdf, other]

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

Authors: Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu

Abstract: Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be… ▽ More Explainable AI (XAI) refers to techniques that provide human-understandable insights into the workings of AI models. Recently, the focus of XAI is being extended towards Large Language Models (LLMs) which are often criticized for their lack of transparency. This extension calls for a significant transformation in XAI methodologies because of two reasons. First, many existing XAI methods cannot be directly applied to LLMs due to their complexity advanced capabilities. Second, as LLMs are increasingly deployed across diverse industry applications, the role of XAI shifts from merely opening the "black box" to actively enhancing the productivity and applicability of LLMs in real-world settings. Meanwhile, unlike traditional machine learning models that are passive recipients of XAI insights, the distinct abilities of LLMs can reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI. We introduce 10 strategies, introducing the key techniques for each and discussing their associated challenges. We also provide case studies to demonstrate how to obtain and leverage explanations. The code used in this paper can be found at: https://github.com/JacksonWuxs/UsableXAI_LLM. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 38 pages, 4 figures

arXiv:2403.08902 [pdf, other]

Envision3D: One Image to 3D with Anchor Views Interpolation

Authors: Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

Abstract: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this… ▽ More We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation. In the first stage, we train the image diffusion model to generate global consistent anchor views conditioning on image-normal pairs. Subsequently, leveraging our video diffusion model fine-tuned on consecutive multi-view images, we conduct interpolation on the previous anchor views to generate extra dense views. This framework yields dense, multi-view consistent images, providing comprehensive 3D information. To further enhance the overall generation quality, we introduce a coarse-to-fine sampling strategy for the reconstruction algorithm to robustly extract textured meshes from the generated dense images. Extensive experiments demonstrate that our method is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: GitHub repository: https://github.com/PKU-YuanGroup/Envision3D

arXiv:2403.07637 [pdf]

Discovery of a Magnetic Topological Semimetal Eu$_3$In$_2$As$_4$ with a Single Pair of Weyl Points

Authors: Ke Jia, **gyu Yao, Xiaobo He, Yupeng Li, Junze Deng, Ming Yang, Junfeng Wang, Zengwei Zhu, Cuixiang Wang, Dayu Yan, Hai L. Feng, Jie Shen, Yongkang Luo, Zhijun Wang, Youguo Shi

Abstract: Magnetic Weyl semimetal (MWS) is a unique topological state with open surface Fermi arc states and other exotic transport phenomena. However, most reported MWSs show multiple pairs of Weyl points and complicated Fermi surfaces, which increases the difficulty of the investigation into the intrinsic chiral transport property. In this wor, we successfully synthesized a soft magnetic Weyl semimetal Eu… ▽ More Magnetic Weyl semimetal (MWS) is a unique topological state with open surface Fermi arc states and other exotic transport phenomena. However, most reported MWSs show multiple pairs of Weyl points and complicated Fermi surfaces, which increases the difficulty of the investigation into the intrinsic chiral transport property. In this wor, we successfully synthesized a soft magnetic Weyl semimetal Eu$_3$In$_2$As$_4$ with a single pair of Weyl points under magnetic fields. The Shubnikov de Haas (SdH) oscillation with a single frequency, as well as a linear hall resistance with the same carrier density, is observed up to 50 Tesla, indicating a single pair of Weyl points around the Fermi level with a massless fermion ($m^* = 0.121 m_0$, $π$ Berry phase). Such a single pair of Weyl points is further confirmed by the density functional theory calculations. The magnetic ordering and band topology can be easily tuned by the external magnetic field. The field-induced MWS Eu$_3$In$_2$As$_4$ with a single pair of Weyl points is a good platform to detect chiral transport properties, including possible quantum anomalous Hall effect. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07392 [pdf, other]

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Authors: Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, Yifeng Shi

Abstract: Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore,… ▽ More Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT backbone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://github.com/Traffic-X/ViT-CoMer. △ Less

Submitted 27 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: CVPR2024

arXiv:2403.06766 [pdf, other]

Determination of the number of $ψ(3686)$ events taken at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be… ▽ More The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$. △ Less

Submitted 28 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06085 [pdf, other]

van Hove Singularity-Driven Emergence of Multiple Flat Bands in Kagome Superconductors

Authors: Hailan Luo, Lin Zhao, Zhen Zhao, Haitao Yang, Yun-Peng Huang, Hongxiong Liu, Yuhao Gu, Feng **, Hao Chen, Taimin Miao, Chaohui Yin, Chengmin Shen, Xiaolin Ren, Bo Liang, Yingjie Shu, Yiwen Chen, Fengfeng Zhang, Feng Yang, Shen** Zhang, Qinjun Peng, Hanqing Mao, Guodong Liu, Jiang** Hu, Youguo Shi, Zuyan Xu , et al. (5 additional authors not shown)

Abstract: The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing… ▽ More The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing high-resolution angle-resolved photoemission (ARPES) measurements, we observed four branches of flat bands that span over the entire momentum space. The appearance of the flat bands is not anticipated from the band structure calculations and cannot be accounted for by the known mechanisms of flat band generation. It is intimately related to the evolution of van Hove singularities. It is for the first time to observe such emergence of multiple flat bands in solid materials. Our findings provide new insights in revealing the underlying mechanism that governs the unusual behaviors in the Kagome superconductors. They also provide a new pathway in producing flat bands and set a platform to study the flat bands related physics. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 20 pages, 4 figures

arXiv:2403.05584 [pdf, other]

doi 10.1145/3613904.3642747

Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse Intervention

Authors: Adiba Orzikulova, Han Xiao, Zhipeng Li, Yukang Yan, Yuntao Wang, Yuanchun Shi, Marzyeh Ghassemi, Sung-Ju Lee, Anind K Dey, Xuhai "Orson" Xu

Abstract: Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collect… ▽ More Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8\% relatively) and receptivity (>8.0\%). In addition, incorporating explanations further enhances the effectiveness by 53.8\% and 11.4\% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0$\sim$8.9\%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.05416 [pdf, other]

SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection

Authors: Yahao Lu, Yupei Lin, Han Wu, Xiaoyu Xian, Yukai Shi, Liang Lin

Abstract: Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. Recently, convolutional neural networks have achieved significant advantages in general object detection. With the development of Transformer, the scale of SIRST models is constantly increasing. Due to the limited training samples, performance has not been improved accordingly. The qualit… ▽ More Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. Recently, convolutional neural networks have achieved significant advantages in general object detection. With the development of Transformer, the scale of SIRST models is constantly increasing. Due to the limited training samples, performance has not been improved accordingly. The quality, quantity, and diversity of the infrared dataset are critical to the detection of small targets. To highlight this issue, we propose a negative sample augmentation method in this paper. Specifically, a negative augmentation approach is proposed to generate massive negatives for self-supervised learning. Firstly, we perform a sequential noise modeling technology to generate realistic infrared data. Secondly, we fuse the extracted noise with the original data to facilitate diversity and fidelity in the generated data. Lastly, we proposed a negative augmentation strategy to enrich diversity as well as maintain semantic invariance. The proposed algorithm produces a synthetic SIRST-5K dataset, which contains massive pseudo-data and corresponding labels. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed. Compared with other state-of-the-art (SOTA) methods, our method achieves outstanding performance in terms of probability of detection (Pd), false-alarm rate (Fa), and intersection over union (IoU). △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: We address the quality, quantity, and diversity of the infrared data in SIRST, the dataset is available at: https://github.com/luy0222/SIRST-5K

arXiv:2403.04976 [pdf, other]

Towards Data-center Level Carbon Modeling and Optimization for Deep Learning Inference

Authors: Shixin Ji, Zhuo** Yang, Xingzhen Chen, **gtong Hu, Yiyu Shi, Alex K. Jones, Peipei Zhou

Abstract: Recently, the increasing need for computing resources has led to the prosperity of data centers, which poses challenges to the environmental impacts and calls for improvements in data center provisioning strategies. In this work, we show a comprehensive analysis based on profiling a variety of deep-learning inference applications on different generations of GPU servers. Our analysis reveals severa… ▽ More Recently, the increasing need for computing resources has led to the prosperity of data centers, which poses challenges to the environmental impacts and calls for improvements in data center provisioning strategies. In this work, we show a comprehensive analysis based on profiling a variety of deep-learning inference applications on different generations of GPU servers. Our analysis reveals several critical factors which can largely affect the design space of provisioning strategies including the hardware embodied cost estimation, application-specific features, and the distribution of carbon cost each year, which prior works have omitted. Based on the observations, we further present a first-order modeling and optimization tool for data center provisioning and scheduling and highlight the importance of environmental impacts from data center management. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 12 pages, 9 figures

arXiv:2403.04566 [pdf, ps, other]

Pullback of arithmetic theta series and its modularity for unitary Shimura curves

Authors: Qiao He, Yousheng Shi, Tonghai Yang

Abstract: This paper is a complement of the modularity result of Bruinier, Howard, Kudla, Rapoport and Yang (BHKRY) for the special case $U(1,1)$ not considered there. The main idea to embed a $U(1, 1)$ Shimura curve to many $U(n-1, 1)$ Shimura varieties for big $n$, and prove a precise pullback formula of the generating series of arithmetic divisors. Afterwards, we use the modularity result of BHKRY togeth… ▽ More This paper is a complement of the modularity result of Bruinier, Howard, Kudla, Rapoport and Yang (BHKRY) for the special case $U(1,1)$ not considered there. The main idea to embed a $U(1, 1)$ Shimura curve to many $U(n-1, 1)$ Shimura varieties for big $n$, and prove a precise pullback formula of the generating series of arithmetic divisors. Afterwards, we use the modularity result of BHKRY together with existence of non-vanishing of classical theta series at any given point in the upper half plane to prove the modulartiy result on $U(1, 1)$ Shimura curves. △ Less

Submitted 7 March, 2024; originally announced March 2024.

MSC Class: 11G15; 11F11; 11F30

arXiv:2403.04531 [pdf, ps, other]

Anatomy-Guided Surface Diffusion Model for Alzheimer's Disease Normative Modeling

Authors: Jianwei Zhang, Yonggang Shi

Abstract: Normative modeling has emerged as a pivotal approach for characterizing heterogeneity and individual variance in neurodegenerative diseases, notably Alzheimer's disease(AD). One of the challenges of cortical normative modeling is the anatomical structure mismatch due to folding pattern variability. Traditionally, registration is applied to address this issue and recently many studies have utilized… ▽ More Normative modeling has emerged as a pivotal approach for characterizing heterogeneity and individual variance in neurodegenerative diseases, notably Alzheimer's disease(AD). One of the challenges of cortical normative modeling is the anatomical structure mismatch due to folding pattern variability. Traditionally, registration is applied to address this issue and recently many studies have utilized deep generative models to generate anatomically align samples for analyzing disease progression; however, these models are predominantly applied to volume-based data, which often falls short in capturing intricate morphological changes on the brain cortex. As an alternative, surface-based analysis has been proven to be more sensitive in disease modeling such as AD, yet, like volume-based data, it also suffers from the mismatch problem. To address these limitations, we proposed a novel generative normative modeling framework by transferring the conditional diffusion generative model to the spherical non-Euclidean domain. Additionally, this approach generates normal feature map distributions by explicitly conditioning on individual anatomical segmentation to ensure better geometrical alignment which helps to reduce anatomical variance between subjects in analysis. We find that our model can generate samples that are better anatomically aligned than registered reference data and through ablation study and normative assessment experiments, the samples are able to better measure individual differences from the normal distribution and increase sensitivity in differentiating cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer's disease (AD) patients. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04233 [pdf, other]

DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

Authors: Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang

Abstract: It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions f… ▽ More It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions from given demonstrations and generates responses through learning task-specific examples. We argue that improvement from ICL does not directly rely on model size, but essentially stems from understanding task definitions and task-guided learning. Inspired by this, DEEP-ICL combines two 3B models with distinct roles (one for concluding task definitions and the other for learning task demonstrations) and achieves comparable performance to LLaMA2-13B. Furthermore, our framework outperforms conventional ICL by overcoming pretraining sequence length limitations, by supporting unlimited demonstrations. We contend that DEEP-ICL presents a novel alternative for achieving efficient few-shot learning, extending beyond the conventional ICL. △ Less

Submitted 16 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03578 [pdf, other]

Causal Disentanglement for Regulating Social Influence Bias in Social Recommendation

Authors: Li Wang, Min Xu, Quangui Zhang, Yunxiao Shi, Qiang Wu

Abstract: Social recommendation systems face the problem of social influence bias, which can lead to an overemphasis on recommending items that friends have interacted with. Addressing this problem is crucial, and existing methods often rely on techniques such as weight adjustment or leveraging unbiased data to eliminate this bias. However, we argue that not all biases are detrimental, i.e., some items reco… ▽ More Social recommendation systems face the problem of social influence bias, which can lead to an overemphasis on recommending items that friends have interacted with. Addressing this problem is crucial, and existing methods often rely on techniques such as weight adjustment or leveraging unbiased data to eliminate this bias. However, we argue that not all biases are detrimental, i.e., some items recommended by friends may align with the user's interests. Blindly eliminating such biases could undermine these positive effects, potentially diminishing recommendation accuracy. In this paper, we propose a Causal Disentanglement-based framework for Regulating Social influence Bias in social recommendation, named CDRSB, to improve recommendation performance. From the perspective of causal inference, we find that the user social network could be regarded as a confounder between the user and item embeddings (treatment) and ratings (outcome). Due to the presence of this social network confounder, two paths exist from user and item embeddings to ratings: a non-causal social influence path and a causal interest path. Building upon this insight, we propose a disentangled encoder that focuses on disentangling user and item embeddings into interest and social influence embeddings. Mutual information-based objectives are designed to enhance the distinctiveness of these disentangled embeddings, eliminating redundant information. Additionally, a regulatory decoder that employs a weight calculation module to dynamically learn the weights of social influence embeddings for effectively regulating social influence bias has been designed. Experimental results on four large-scale real-world datasets Ciao, Epinions, Dian**, and Douban book demonstrate the effectiveness of CDRSB compared to state-of-the-art baselines. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03500 [pdf, other]

Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to… ▽ More Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 3 figures

arXiv:2403.03310 [pdf, other]

Graph Learning for Parameter Prediction of Quantum Approximate Optimization Algorithm

Authors: Zhiding Liang, Gang Liu, Zheyuan Liu, **glei Cheng, Tianyi Hao, Kecheng Liu, Hang Ren, Zhixin Song, Ji Liu, Fanny Ye, Yiyu Shi

Abstract: In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of com… ▽ More In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of combinatorial optimization. However, practical application faces challenges due to current limitations on quantum computational resource. Our work optimizes QAOA initialization, using Graph Neural Networks (GNN) as a warm-start technique. This sacrifices affordable computational resource on classical computer to reduce quantum computational resource overhead, enhancing QAOA's effectiveness. Experiments with various GNN architectures demonstrate the adaptability and stability of our framework, highlighting the synergy between quantum algorithms and machine learning. Our findings show GNN's potential in improving QAOA performance, opening new avenues for hybrid quantum-classical approaches in quantum computing and contributing to practical applications. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01761 [pdf, other]

Observation of $ψ(3686)\to 3φ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (645 additional authors not shown)

Abstract: Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str… ▽ More Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00727 [pdf, ps, other]

Moduli of sheaves on fourfolds as derived Lagrangian intersections

Authors: Nachiketa Adhikari, Yun Shi

Abstract: We show that any $(-2)$-shifted symplectic derived scheme $\textbf{X}$ (of finite type over an algebraically closed field of characteristic zero) is locally equivalent to the derived intersection of two Lagrangian morphisms to a $(-1)$-shifted symplectic derived scheme which is the $(-1)$-shifted cotangent stack of a smooth classical scheme. This leads to the possibility of the following viewpoint… ▽ More We show that any $(-2)$-shifted symplectic derived scheme $\textbf{X}$ (of finite type over an algebraically closed field of characteristic zero) is locally equivalent to the derived intersection of two Lagrangian morphisms to a $(-1)$-shifted symplectic derived scheme which is the $(-1)$-shifted cotangent stack of a smooth classical scheme. This leads to the possibility of the following viewpoint that is, at least to us, new: any $n$-shifted symplectic derived scheme can be obtained, locally, by repeated derived Lagrangian intersections in a smooth classical scheme. We also give a separate proof of our main result in the case where the local Darboux atlas cdga for $\textbf{X}$ has an even number of generators in degree $(-1)$; in this case we strengthen the result by showing that $\textbf{X}$ is in fact locally equivalent to the derived critical locus of a shifted function, which we've been told is a folklore result in the field. We indicate the implications of this for derived moduli stacks of sheaves on Calabi-Yau fourfolds by spelling out the case when the fourfold is $\mathbb{C}^4$. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 24 pages. Comments are welcome

arXiv:2403.00249 [pdf, other]

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Abstract: In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment. However, in most existing methods, the reconstruction targets for MIM lack high-level semantics, and text is not sufficiently involved in masked modeling. These two drawbacks limit the effect of MIM in facilitating cross-modal semantic alignment. In this work, we… ▽ More In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment. However, in most existing methods, the reconstruction targets for MIM lack high-level semantics, and text is not sufficiently involved in masked modeling. These two drawbacks limit the effect of MIM in facilitating cross-modal semantic alignment. In this work, we propose a semantics-enhanced cross-modal MIM framework (SemMIM) for vision-language representation learning. Specifically, to provide more semantically meaningful supervision for MIM, we propose a local semantics enhancing approach, which harvest high-level semantics from global image features via self-supervised agreement learning and transfer them to local patch encodings by sharing the encoding space. Moreover, to achieve deep involvement of text during the entire MIM process, we propose a text-guided masking strategy and devise an efficient way of injecting textual information in both masked modeling and reconstruction target acquisition. Experimental results validate that our method improves the effectiveness of the MIM task in facilitating cross-modal semantic alignment. Compared to previous VLP models with similar model size and data scale, our SemMIM model achieves state-of-the-art or competitive performance on multiple downstream vision-language tasks. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

arXiv:2403.00041 [pdf, other]

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Authors: Hongxia Li, Wei Huang, **gya Wang, Ye Shi

Abstract: Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federat… ▽ More Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federated prompt learning methods lack specialized designs to systematically address severe data heterogeneities, e.g., data distribution with both label and feature shifts involved. To address this challenge, we present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis. Specifically, for each client, we learn a global prompt to extract consensus knowledge among clients, and a local prompt to capture client-specific category characteristics. Unbalanced Optimal Transport is then employed to align local visual features with these prompts, striking a balance between global consensus and local personalization. By relaxing one of the equality constraints, FedOTP enables prompts to focus solely on the core regions of image patches. Extensive experiments on datasets with various types of heterogeneities have demonstrated that our FedOTP outperforms the state-of-the-art methods. △ Less

Submitted 3 April, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.18996 [pdf]

Metasurface spectrometers beyond resolution-sensitivity constraints

Authors: Feng Tang, **gjun Wu, Tom Albrow-Owen, Hanxiao Cui, Fujia Chen, Yaqi Shi, Lan Zou, Jun Chen, Xuhan Guo, Yijun Sun, Jikui Luo, Bingfeng Ju, **g Huang, Shuangli Liu, Bo Li, Liming Yang, Eric Anthony Munro, Wanguo Zheng, Hannah J. Joyce, Hongsheng Chen, Lufeng Che, Shurong Dong, Tawfique Hasan, Xin Ye, Yihao Yang , et al. (1 additional authors not shown)

Abstract: Optical spectroscopy plays an essential role across scientific research and industry for non-contact materials analysis1-3, increasingly through in-situ or portable platforms4-6. However, when considering low-light-level applications, conventional spectrometer designs necessitate a compromise between their resolution and sensitivity7,8, especially as device and detector dimensions are scaled down.… ▽ More Optical spectroscopy plays an essential role across scientific research and industry for non-contact materials analysis1-3, increasingly through in-situ or portable platforms4-6. However, when considering low-light-level applications, conventional spectrometer designs necessitate a compromise between their resolution and sensitivity7,8, especially as device and detector dimensions are scaled down. Here, we report on a miniaturizable spectrometer platform where light throughput onto the detector is instead enhanced as the resolution is increased. This planar, CMOS-compatible platform is based around metasurface encoders designed to exhibit photonic bound states in the continuum9, where operational range can be altered or extended simply through adjusting geometric parameters. This system can enhance photon collection efficiency by up to two orders of magnitude versus conventional designs; we demonstrate this sensitivity advantage through ultra-low-intensity fluorescent and astrophotonic spectroscopy. This work represents a step forward for the practical utility of spectrometers, affording a route to integrated, chip-based devices that maintain high resolution and SNR without requiring prohibitively long integration times. △ Less

Submitted 1 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18933 [pdf, other]

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Authors: Tony C. W. Mok, Zi Li, Yunhao Bai, Jianpeng Zhang, Wei Liu, Yan-Jie Zhou, Ke Yan, Dakai **, Yu Shi, Xiaoli Yin, Le Lu, Ling Zhang

Abstract: Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise,… ▽ More Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy. △ Less

Submitted 31 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR2024

arXiv:2402.18833 [pdf, ps, other]

doi 10.1103/PhysRevMaterials.7.094004

Layer-dependent Raman spectroscopy of ultrathin Ta$_2$Pd$_3$Te$_5$

Authors: Zhenyu Sun, Zhaopeng Guo, Dayu Yan, Peng Cheng, Lan Chen, Youguo Shi, Yuan Huang, Zhijun Wang, Kehui Wu, Baojie Feng

Abstract: Two-dimensional topological insulators (2DTIs) or quantum spin Hall insulators are attracting increasing attention due to their potential applications in next-generation spintronic devices. Despite their promising prospects, realizable 2DTIs are still limited. Recently, Ta2Pd3Te5, a semiconducting van der Waals material, has shown spectroscopic evidence of quantum spin Hall states. However, achiev… ▽ More Two-dimensional topological insulators (2DTIs) or quantum spin Hall insulators are attracting increasing attention due to their potential applications in next-generation spintronic devices. Despite their promising prospects, realizable 2DTIs are still limited. Recently, Ta2Pd3Te5, a semiconducting van der Waals material, has shown spectroscopic evidence of quantum spin Hall states. However, achieving controlled preparation of few- to monolayer samples, a crucial step in realizing quantum spin Hall devices, has not yet been achieved. In this work, we fabricated few- to monolayer Ta$_2$Pd$_3$Te$_5$ and performed systematic thickness- and temperature-dependent Raman spectroscopy measurements. Our results demonstrate that Raman spectra can provide valuable information to determine the thickness of Ta2Pd3Te5 thin flakes. Moreover, our angle-resolved polarized Raman (ARPR) spectroscopy measurements show that the intensities of the Raman peaks are strongly anisotropic due to the quasi-one-dimensional atomic structure, providing a straightforward method to determine its crystalline orientation. Our findings may stimulate further efforts to realize quantum devices based on few or monolayer Ta$_2$Pd$_3$Te$_5$. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Journal ref: Phys. Rev. Materials 7, 094004 (2023)

arXiv:2402.18720 [pdf, other]

Small and Large Dust Cavities in Disks around mid-M Stars in Taurus

Authors: Yangfan Shi, Feng Long, Gregory J. Herczeg, Daniel Harsono, Yao Liu, Paola Pinilla, Enrico Ragusa, Doug Johnstone, Xue-Ning Bai, Ilaria Pascucci, Carlo F. Manara, Gijs D. Mulders, Lucas A. Cieza

Abstract: High-angular resolution imaging by ALMA has revealed the near-universality and diversity of substructures in protoplanetary disks. However, disks around M-type pre-main-sequence stars are still poorly sampled, despite the prevalence of M-dwarfs in the galaxy. Here we present high-resolution (~50 mas, 8 au) ALMA Band 6 observations of six disks around mid-M stars in Taurus. We detect dust continuum… ▽ More High-angular resolution imaging by ALMA has revealed the near-universality and diversity of substructures in protoplanetary disks. However, disks around M-type pre-main-sequence stars are still poorly sampled, despite the prevalence of M-dwarfs in the galaxy. Here we present high-resolution (~50 mas, 8 au) ALMA Band 6 observations of six disks around mid-M stars in Taurus. We detect dust continuum emission in all six disks, 12CO in five disks, and 13CO line in two disks. The size ratios between gas and dust disks range from 1.6 to 5.1. The ratio of about 5 for 2M0436 and 2M0450 indicates efficient dust radial drift. Four disks show rings and cavities and two disks are smooth. The cavity sizes occupy a wide range: 60 au for 2M0412, and ~10 au for 2M0434, 2M0436 and 2M0508. Detailed visibility modeling indicates that small cavities of 1.7 and 5.7 au may hide in the two smooth disks 2M0450 and CIDA 12. We perform radiative transfer fitting of the infrared SEDs to constrain the cavity sizes, finding that micron-sized dust grains may have smaller cavities than millimeter grains. Planet-disk interactions are the preferred explanation to produce the large 60 au cavity, while other physics could be responsible for the three ~10 au cavities under current observations and theories. Currently, disks around mid-to-late M stars in Taurus show a higher detection frequency of cavities than earlier type stars, although a more complete sample is needed to evaluate any dependence of substructure on stellar mass. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 25 pages, 6 figures. Accepted for publication in ApJ

arXiv:2402.18577 [pdf, other]

Motion Guided Token Compression for Efficient Masked Video Modeling

Authors: Yukun Feng, Yangming Shi, Fengze Liu, Tan Yan

Abstract: Recent developments in Transformers have achieved notable strides in enhancing video comprehension. Nonetheless, the O($N^2$) computation complexity associated with attention mechanisms presents substantial computational hurdles when dealing with the high dimensionality of videos. This challenge becomes particularly pronounced when striving to increase the frames per second (FPS) to enhance the mo… ▽ More Recent developments in Transformers have achieved notable strides in enhancing video comprehension. Nonetheless, the O($N^2$) computation complexity associated with attention mechanisms presents substantial computational hurdles when dealing with the high dimensionality of videos. This challenge becomes particularly pronounced when striving to increase the frames per second (FPS) to enhance the motion capturing capabilities. Such a pursuit is likely to introduce redundancy and exacerbate the existing computational limitations. In this paper, we initiate by showcasing the enhanced performance achieved through an escalation in the FPS rate. Additionally, we present a novel approach, Motion Guided Token Compression (MGTC), to empower Transformer models to utilize a smaller yet more representative set of tokens for comprehensive video representation. Consequently, this yields substantial reductions in computational burden and remains seamlessly adaptable to increased FPS rates. Specifically, we draw inspiration from video compression algorithms and scrutinize the variance between patches in consecutive video frames across the temporal dimension. The tokens exhibiting a disparity below a predetermined threshold are then masked. Notably, this masking strategy effectively addresses video redundancy while conserving essential information. Our experiments, conducted on widely examined video recognition datasets, Kinetics-400, UCF101 and HMDB51, demonstrate that elevating the FPS rate results in a significant top-1 accuracy score improvement of over 1.6, 1.6 and 4.0. By implementing MGTC with the masking ratio of 25\%, we further augment accuracy by 0.1 and simultaneously reduce computational costs by over 31\% on Kinetics-400. Even within a fixed computational budget, higher FPS rates paired with MGTC sustain performance gains when compared to lower FPS settings. △ Less

Submitted 10 January, 2024; originally announced February 2024.

arXiv:2402.18432 [pdf, other]

Phase transitions of Fe$_2$O$_3$ under laser shock compression

Authors: A. Amouretti, C. Crépisson, S. Azadi, D. Cabaret, T. Campbell, D. A. Chin, B. Colin, G. R. Collins, L. Crandall, G. Fiquet, A. Forte, T. Gawne, F. Guyot, P. Heighway, H. Lee, D. McGonegle, B. Nagler, J. Pintor, D. Polsin, G. Rousse, Y. Shi, E. Smith, J. S. Wark, S. M. Vinko, M. Harmand

Abstract: We present in-situ x-ray diffraction and velocity measurements of Fe$_2$O$_3$ under laser shock compression at pressures between 38-116 GPa. None of the phases reported by static compression studies were observed. Instead, we observed an isostructural phase transition from $α$-Fe$_2$O$_3$ to a new $α^\prime$-Fe$_2$O$_3$ phase at a pressure of 50-62 GPa. The $α^\prime$-Fe$_2$O$_3$ phase differs fro… ▽ More We present in-situ x-ray diffraction and velocity measurements of Fe$_2$O$_3$ under laser shock compression at pressures between 38-116 GPa. None of the phases reported by static compression studies were observed. Instead, we observed an isostructural phase transition from $α$-Fe$_2$O$_3$ to a new $α^\prime$-Fe$_2$O$_3$ phase at a pressure of 50-62 GPa. The $α^\prime$-Fe$_2$O$_3$ phase differs from $α$-Fe$_2$O$_3$ by an 11% volume drop and a different unit cell compressibility. We further observed a two-wave structure in the velocity profile, which can be related to an intermediate regime where both $α$ and $α^\prime$ phases coexist. Density functional theory calculations with a Hubbard parameter indicate that the observed unit cell volume drop can be associated with a spin transition following a magnetic collapse. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 5 figures

arXiv:2402.18411 [pdf, other]

Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

Authors: Bin Li, Ye Shi, Qian Yu, **gya Wang

Abstract: Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tas… ▽ More Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 24.44%, and on Office-Home, it demonstrates a P@15 improvement of 12.12%. Code is available at https://github.com/HCVLAB/ProtoOT. △ Less

Submitted 24 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted by AAAI2024

arXiv:2402.18311 [pdf, other]

Esca** Local Optima in Global Placement

Authors: Ke Xue, Xi Lin, Yunqi Shi, Shixiong Kai, Siyuan Xu, Chao Qian

Abstract: Placement is crucial in the physical design, as it greatly affects power, performance, and area metrics. Recent advancements in analytical methods, such as DREAMPlace, have demonstrated impressive performance in global placement. However, DREAMPlace has some limitations, e.g., may not guarantee legalizable placements under the same settings, leading to fragile and unpredictable results. This paper… ▽ More Placement is crucial in the physical design, as it greatly affects power, performance, and area metrics. Recent advancements in analytical methods, such as DREAMPlace, have demonstrated impressive performance in global placement. However, DREAMPlace has some limitations, e.g., may not guarantee legalizable placements under the same settings, leading to fragile and unpredictable results. This paper highlights the main issue as being stuck in local optima, and proposes a hybrid optimization framework to efficiently escape the local optima, by perturbing the placement result iteratively. The proposed framework achieves significant improvements compared to state-of-the-art methods on two popular benchmarks. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Work-in-Progress (WIP) poster of DAC 2024

arXiv:2402.18172 [pdf, other]

NiteDR: Nighttime Image De-Raining with Cross-View Sensor Cooperative Learning for Dynamic Driving Scenes

Authors: Cidan Shi, Lihuang Fang, Han Wu, Xiaoyu Xian, Yukai Shi, Liang Lin

Abstract: In real-world environments, outdoor imaging systems are often affected by disturbances such as rain degradation. Especially, in nighttime driving scenes, insufficient and uneven lighting shrouds the scenes in darkness, resulting degradation of both the image quality and visibility. Particularly, in the field of autonomous driving, the visual perception ability of RGB sensors experiences a sharp de… ▽ More In real-world environments, outdoor imaging systems are often affected by disturbances such as rain degradation. Especially, in nighttime driving scenes, insufficient and uneven lighting shrouds the scenes in darkness, resulting degradation of both the image quality and visibility. Particularly, in the field of autonomous driving, the visual perception ability of RGB sensors experiences a sharp decline in such harsh scenarios. Additionally, driving assistance systems suffer from reduced capabilities in capturing and discerning the surrounding environment, posing a threat to driving safety. Single-view information captured by single-modal sensors cannot comprehensively depict the entire scene. To address these challenges, we developed an image de-raining framework tailored for rainy nighttime driving scenes. It aims to remove rain artifacts, enrich scene representation, and restore useful information. Specifically, we introduce cooperative learning between visible and infrared images captured by different sensors. By cross-view fusion of these multi-source data, the scene within the images gains richer texture details and enhanced contrast. We constructed an information cleaning module called CleanNet as the first stage of our framework. Moreover, we designed an information fusion module called FusionNet as the second stage to fuse the clean visible images with infrared images. Using this stage-by-stage learning strategy, we obtain de-rained fusion images with higher quality and better visual perception. Extensive experiments demonstrate the effectiveness of our proposed Cross-View Cooperative Learning (CVCL) in adverse driving scenarios in low-light rainy environments. The proposed approach addresses the gap in the utilization of existing rain removal algorithms in specific low-light conditions. △ Less

Submitted 7 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2402.17152 [pdf, other]

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Authors: Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

Abstract: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in… ▽ More Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework ("Generative Recommenders"), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundational models in recommendations. △ Less

Submitted 5 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 26 pages, 13 figures. ICML'24. Code available at https://github.com/facebookresearch/generative-recommenders

arXiv:2402.16769 [pdf, other]

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

Abstract: In video-text retrieval, most existing methods adopt the dual-encoder architecture for fast retrieval, which employs two individual encoders to extract global latent representations for videos and texts. However, they face challenges in capturing fine-grained semantic concepts. In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and… ▽ More In video-text retrieval, most existing methods adopt the dual-encoder architecture for fast retrieval, which employs two individual encoders to extract global latent representations for videos and texts. However, they face challenges in capturing fine-grained semantic concepts. In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval. Specifically, we map videos and texts into a pre-defined lexicon space, where each dimension corresponds to a semantic concept. A two-stage semantics grounding approach is proposed to activate semantically relevant dimensions and suppress irrelevant dimensions. The learned lexicon representations can thus reflect fine-grained semantics of videos and texts. Furthermore, to leverage the complementarity between latent and lexicon representations, we propose a unified learning scheme to facilitate mutual learning via structure sharing and self-distillation. Experimental results show our UNIFY framework largely outperforms previous video-text retrieval methods, with 4.8% and 8.2% Recall@1 improvement on MSR-VTT and DiDeMo respectively. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to LREC-COLING 2024

arXiv:2402.15968 [pdf, other]

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

Authors: Abhishek Singh, Gauri Gupta, Ritvik Kapila, Yichuan Shi, Alex Dang, Sheshank Shankar, Mohammed Ehab, Ramesh Raskar

Abstract: Federated Learning (FL) enables collaborative optimization of machine learning models across decentralized data by aggregating model parameters. Our approach extends this concept by aggregating "knowledge" derived from models, instead of model parameters. We present a novel framework called CoDream, where clients collaboratively optimize randomly initialized data using federated optimization in th… ▽ More Federated Learning (FL) enables collaborative optimization of machine learning models across decentralized data by aggregating model parameters. Our approach extends this concept by aggregating "knowledge" derived from models, instead of model parameters. We present a novel framework called CoDream, where clients collaboratively optimize randomly initialized data using federated optimization in the input data space, similar to how randomly initialized model parameters are optimized in FL. Our key insight is that jointly optimizing this data can effectively capture the properties of the global data distribution. Sharing knowledge in data space offers numerous benefits: (1) model-agnostic collaborative learning, i.e., different clients can have different model architectures; (2) communication that is independent of the model size, eliminating scalability concerns with model parameters; (3) compatibility with secure aggregation, thus preserving the privacy benefits of federated learning; (4) allowing of adaptive optimization of knowledge shared for personalized learning. We empirically validate CoDream on standard FL tasks, demonstrating competitive performance despite not sharing model parameters. Our code: https://mitmedialab.github.io/codream.github.io/ △ Less

Submitted 27 February, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: 16 pages, 12 figures, 5 tables

arXiv:2402.15391 [pdf, other]

Genie: Generative Interactive Environments

Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, **gwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem… ▽ More We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: https://sites.google.com/corp/view/genie-2024/

arXiv:2402.14905 [pdf, other]

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Authors: Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra

Abstract: This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in… ▽ More This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases. △ Less

Submitted 26 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: ICML 2024. Code is available at https://github.com/facebookresearch/MobileLLM

arXiv:2402.14518 [pdf, other]

Bulk Boundary Paradox in the Surface Reconstructed Magnetic Weyl Semimetal NdAlSi

Authors: Cong Li, Jianfeng Zhang, Hongxiong Liu, Wanyu Chen, Guowei Liu, Hanbin Deng, Craig Polley, Balasubramanian Thiagarajan, Timur Kim, Jiaxin Yin, Youguo Shi, Tao Xiang, Oscar Tjernberg

Abstract: The bulk boundary correspondence in the context of Weyl semimetals is a fundamental topological principle that establishes a connection between the bulk properties of the material and the emergence of specific surface states. In Weyl semimetals, the bulk boundary correspondence is manifested by the presence of surface Fermi arcs connecting pairs of Weyl nodes with opposite chirality. Here we demon… ▽ More The bulk boundary correspondence in the context of Weyl semimetals is a fundamental topological principle that establishes a connection between the bulk properties of the material and the emergence of specific surface states. In Weyl semimetals, the bulk boundary correspondence is manifested by the presence of surface Fermi arcs connecting pairs of Weyl nodes with opposite chirality. Here we demonstrate that this bulk boundary correspondence is challenged in the case of the surface selectively reconstructed noncentrosymmetric magnetic Weyl semimetal NdAlSi. By comparing angle-resolved photoemission spectroscopy measurements with surface projected density functional theory calculations and scanning tunneling microscope measurements, the existence of surface selective spontaneous reconstruction is demonstrated. The surface reconstruction in NdAlSi not only leads to the reconstruction of the surface Fermi arcs, but also generates new surface Fermi arcs that do not connect corresponding Weyl nodes. This observation challenges the conventional view of the bulk boundary correspondence in Weyl semimetals. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 18 pages, 4 figures

arXiv:2402.14447 [pdf, other]

Observation of Surface State Suppression in Magnetic Weyl Semimetal NdAlSi

Authors: Cong Li, Jianfeng Zhang, Hongxiong Liu, Wanyu Chen, Timur Kim, Youguo Shi, Tao Xiang, Oscar Tjernberg

Abstract: Understanding and mastering the control of surface states in topological materials are crucial steps for the development of future electronic devices and technologies that leverage the unique properties of these materials. Here, using angle-resolved photoemission spectroscopy, we provide a good case study of this by visualizing the electronic structure of a magnetic Weyl semimetal on both flat and… ▽ More Understanding and mastering the control of surface states in topological materials are crucial steps for the development of future electronic devices and technologies that leverage the unique properties of these materials. Here, using angle-resolved photoemission spectroscopy, we provide a good case study of this by visualizing the electronic structure of a magnetic Weyl semimetal on both flat and uneven surfaces. Our observations reveal that the preparation of an uneven sample surface can effectively suppress all surface states in the Weyl semimetal NdAlSi, including topological surface Fermi arcs. This results in the observation of pure bulk states devoid of any Fermi energy shift. This discovery not only opens up a new avenue to directly study the pure bulk states of a Weyl semimetal using low photon energies in ARPES but also provides key insights into the control of the surface states in topological materials. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 13 pages, 4 figures. arXiv admin note: text overlap with arXiv:2303.17302

arXiv:2402.14436 [pdf]

Structural and resistivity properties of Fe$_{1-x}$Co${_x}$Se single crystals grown by the molten salt method

Authors: Qiaoyu Wang, Mingwei Ma, Binbin Ruan, Menghu Zhou, Yadong Gu, Qingsong Yang, Lewei Chen, Yunqing Shi, Junkun Yi, Genfu Chen, Zhian Ren

Abstract: A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively cou… ▽ More A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively coupled plasma atomic emission spectroscopy and energy dispersive spectrometer. X-ray diffraction analysis demonstrates that the crystal lattice parameters $a$ and $c$ are both linearly decreased with increasing Co do** level x. In the whole do** range, all the samples show metallic behaviour in contrast to a metal insulator transition of Cu-doped FeSe according to the resistivity measurements △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13510 [pdf, other]

SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields

Authors: Zhentao Huang, Yukun Shi, Neil Bruce, Minglun Gong

Abstract: The widespread adoption of implicit neural representations, especially Neural Radiance Fields (NeRF), highlights a growing need for editing capabilities in implicit 3D models, essential for tasks like scene post-processing and 3D content creation. Despite previous efforts in NeRF editing, challenges remain due to limitations in editing flexibility and quality. The key issue is develo** a neural… ▽ More The widespread adoption of implicit neural representations, especially Neural Radiance Fields (NeRF), highlights a growing need for editing capabilities in implicit 3D models, essential for tasks like scene post-processing and 3D content creation. Despite previous efforts in NeRF editing, challenges remain due to limitations in editing flexibility and quality. The key issue is develo** a neural representation that supports local edits for real-time updates. Current NeRF editing methods, offering pixel-level adjustments or detailed geometry and color modifications, are mostly limited to static scenes. This paper introduces SealD-NeRF, an extension of Seal-3D for pixel-level editing in dynamic settings, specifically targeting the D-NeRF network. It allows for consistent edits across sequences by map** editing actions to a specific timeframe, freezing the deformation network responsible for dynamic scene representation, and using a teacher-student approach to integrate changes. △ Less

Submitted 20 February, 2024; originally announced February 2024.

MSC Class: 68T45

arXiv:2402.13076 [pdf, other]

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition

Authors: Yang Li, Yuan Shangguan, Yuhao Wang, Liangzhen Lai, Ernie Chang, Changsheng Zhao, Yangyang Shi, Vikas Chandra

Abstract: Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are inv… ▽ More Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. Armed with this insight, we developed design guidelines aimed at optimizing on-device speech recognition models. These guidelines focus on minimizing power use without substantially affecting accuracy. Our method, which employs targeted compression based on the varying sensitivities of weight parameters, demonstrates superior performance compared to state-of-the-art compression methods. It achieves a reduction in energy usage of up to 47% while maintaining similar model accuracy and improving the real-time factor. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.13061 [pdf, other]

Toward Fairness via Maximum Mean Discrepancy Regularization on Logits Space

Authors: Hao-Wei Chung, Ching-Hao Chiu, Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho

Abstract: Fairness has become increasingly pivotal in machine learning for high-risk applications such as machine learning in healthcare and facial recognition. However, we see the deficiency in the previous logits space constraint methods. Therefore, we propose a novel framework, Logits-MMD, that achieves the fairness condition by imposing constraints on output logits with Maximum Mean Discrepancy. Moreove… ▽ More Fairness has become increasingly pivotal in machine learning for high-risk applications such as machine learning in healthcare and facial recognition. However, we see the deficiency in the previous logits space constraint methods. Therefore, we propose a novel framework, Logits-MMD, that achieves the fairness condition by imposing constraints on output logits with Maximum Mean Discrepancy. Moreover, quantitative analysis and experimental results show that our framework has a better property that outperforms previous methods and achieves state-of-the-art on two facial recognition datasets and one animal dataset. Finally, we show experimental results and demonstrate that our debias approach achieves the fairness condition effectively. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12419 [pdf, other]

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

Authors: Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

Abstract: Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction e… ▽ More Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction error. Our approach involves sampling a small dataset for calibration and utilizing backpropagation to iteratively optimize block-wise reconstruction error, on a block-by-block basis, aiming for optimal solutions. Extensive experiments on various benchmarks consistently demonstrate the superiority of our method over other baselines. For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of 75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes, and the entire framework can be executed on a single 16GB GPU. The source code is available at https://github.com/sunggo/EBFT. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11303 [pdf, other]

FViT: A Focal Vision Transformer with Gabor Filter

Authors: Yulong Shi, Mingwei Sun, Yongshuai Wang, Rui Wang, Hui Sun, Zengqiang Chen

Abstract: Vision transformers have achieved encouraging progress in various computer vision tasks. A common belief is that this is attributed to the competence of self-attention in modeling the global dependencies among feature tokens. Unfortunately, self-attention still faces some challenges in dense prediction tasks, such as the high computational complexity and absence of desirable inductive bias. To add… ▽ More Vision transformers have achieved encouraging progress in various computer vision tasks. A common belief is that this is attributed to the competence of self-attention in modeling the global dependencies among feature tokens. Unfortunately, self-attention still faces some challenges in dense prediction tasks, such as the high computational complexity and absence of desirable inductive bias. To address these issues, we revisit the potential benefits of integrating vision transformer with Gabor filter, and propose a Learnable Gabor Filter (LGF) by using convolution. As an alternative to self-attention, we employ LGF to simulate the response of simple cells in the biological visual system to input images, prompting models to focus on discriminative feature representations of targets from various scales and orientations. Additionally, we design a Bionic Focal Vision (BFV) block based on the LGF. This block draws inspiration from neuroscience and introduces a Multi-Path Feed Forward Network (MPFFN) to emulate the working way of biological visual cortex processing information in parallel. Furthermore, we develop a unified and efficient pyramid backbone network family called Focal Vision Transformers (FViTs) by stacking BFV blocks. Experimental results show that FViTs exhibit highly competitive performance in various vision tasks. Especially in terms of computational efficiency and scalability, FViTs show significant advantages compared with other counterparts. Code is available at https://github.com/nkusyl/FViT △ Less

Submitted 26 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2402.11207 [pdf, ps, other]

Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (593 additional authors not shown)

Abstract: Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the… ▽ More Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Showing 201–250 of 3,048 results for author: Shi, Y