Search | arXiv e-print repository

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0301

arXiv:2406.10536 [pdf, other]

doi 10.1016/j.scib.2024.06.011

Universal materials model of deep-learning density functional theory Hamiltonian

Authors: Yuxiang Wang, Yang Li, Zechen Tang, He Li, Zilong Yuan, Honggeng Tao, Nianlong Zou, Ting Bao, Xinghao Liang, Zezhou Chen, Shanghua Xu, Ce Bian, Zhiming Xu, Chong Wang, Chen Si, Wenhui Duan, Yong Xu

Abstract: Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by develo** universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling compu… ▽ More Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by develo** universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling computational modeling of the complicated structure-property relationship of materials in general. By constructing a large materials database and substantially improving the DeepH method, we obtain a universal materials model of DeepH capable of handling diverse elemental compositions and material structures, achieving remarkable accuracy in predicting material properties. We further showcase a promising application of fine-tuning universal materials models for enhancing specific materials models. This work not only demonstrates the concept of DeepH's universal materials model but also lays the groundwork for develo** large materials models, opening up significant opportunities for advancing artificial intelligence-driven materials discovery. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10494 [pdf, other]

Detection and Utilization of Reflections in LiDAR Scans Through Plane Optimization and Plane SLAM

Authors: Yinjie Li, Xiting Zhao, Sören Schwertfeger

Abstract: In LiDAR sensing, glass, mirrors and other material often cause inconsistent data readings, because the laser beams may report the distance of the glass, the distance of the object behind the glass or the distance to a reflected object. This causes problems in robotics and 3D reconstruction, especially with respect to localization, map** and thus navigation. With dual-return LiDARs and other met… ▽ More In LiDAR sensing, glass, mirrors and other material often cause inconsistent data readings, because the laser beams may report the distance of the glass, the distance of the object behind the glass or the distance to a reflected object. This causes problems in robotics and 3D reconstruction, especially with respect to localization, map** and thus navigation. With dual-return LiDARs and other methods, one can detect the glass plane and classify the points in a single scan. In this work we go one step further and construct a global, optimized map of reflective planes, in order to then classify all LiDAR readings at the end. As our experiments will show, this approach provides superior classification accuracy compared to the single scan approach. The code and data for this work are available as open source online. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10298 [pdf]

Enhancing Resilience of Power Systems against Typhoon Threats: A Hybrid Data-Model Driven Approach

Authors: Yang Li

Abstract: This chapter addresses the increasing vulnerability of coastal regions to typhoons and the consequent power outages, emphasizing the critical role of power transmission systems in disaster resilience. It introduces a framework for assessing and enhancing the resilience of these systems against typhoon impacts. The approach integrates a hybrid-driven model for system failure analysis and resilience… ▽ More This chapter addresses the increasing vulnerability of coastal regions to typhoons and the consequent power outages, emphasizing the critical role of power transmission systems in disaster resilience. It introduces a framework for assessing and enhancing the resilience of these systems against typhoon impacts. The approach integrates a hybrid-driven model for system failure analysis and resilience assessment, employing both data-driven and model-driven techniques. It includes a unique method to identify system vulnerabilities and optimal strategies for resilience enhancement, considering cost-effectiveness. The efficacy of this method is demonstrated through simulations on the IEEE RTS-79 system under realistic typhoon scenarios, showcasing its potential to guide planners in making informed decisions for disaster resilience. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted in 'Design for Reliability and Resilience of Power Systems,' Elsevier

arXiv:2406.10262 [pdf, other]

Fast solution to the fair ranking problem using the Sinkhorn algorithm

Authors: Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa, Yuichi Takano

Abstract: In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impac… ▽ More In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.10236 [pdf, other]

Lightening Anything in Medical Images

Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, **gyi Xu, Lipeng Ma, Yatian Yang, **hong Zhou

Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous paired images, presenting challenges in data collection and training costs, all while lacking the ability to generalize effectively. Here, we introduce a pioneering training-free Diffusion Model for Universal Medical Image Enhancement, named UniMIE. UniMIE demonstrates its unsupervised enhancement capabilities across various medical image modalities without the need for any fine-tuning. It accomplishes this by relying solely on a single pre-trained model from ImageNet. We conduct a comprehensive evaluation on 13 imaging modalities and over 15 medical types, demonstrating better qualities, robustness, and accuracy than other modality-specific and data-inefficient models. By delivering high-quality enhancement and corresponding accuracy downstream tasks across a wide range of tasks, UniMIE exhibits considerable potential to accelerate the advancement of diagnostic tools and customized treatment plans. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2406.10123 [pdf, other]

Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (164 additional authors not shown)

Abstract: We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstr… ▽ More We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstructing and summing visible energies, often experience sizable biases and resolution smearing because of the complex nature of neutrino interactions and the detector response. The estimation of neutrino energy can be improved after considering the kinematics information of reconstructed final-state particles. Utilizing kinematic information of reconstructed particles, the deep learning-based approach shows improved resolution and reduced bias for the muon neutrino Monte Carlo simulation sample compared to the traditional approach. In order to address the common concern about the effectiveness of this method on experimental data, the RNN-based energy estimator is further examined and validated with dedicated data-simulation consistency tests using MicroBooNE data. We also assess its potential impact on a neutrino oscillation study after accounting for all statistical and systematic uncertainties and show that it enhances physics sensitivity. This method has good potential to improve the performance of other physics analyses. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0287

arXiv:2406.10100 [pdf, other]

SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

Authors: Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, Linlin Wang, Bo Dang, Jiangwei Lao, Jian Wang, **gdong Chen, Yihua Tan, Yansheng Li

Abstract: Remote Sensing Large Multi-Modal Models (RSLMMs) are develo** rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca… ▽ More Remote Sensing Large Multi-Modal Models (RSLMMs) are develo** rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-scale instruction tuning dataset FIT-RS, containing 1,800,851 instruction samples. FIT-RS covers common interpretation tasks and innovatively introduces several complex comprehension tasks of escalating difficulty, ranging from relation reasoning to image-level scene graph generation. Based on FIT-RS, we build the FIT-RSFG benchmark. Furthermore, we establish a new benchmark to evaluate the fine-grained relation comprehension capabilities of LMMs, named FIT-RSRC. Based on combined instruction data, we propose SkySenseGPT, which achieves outstanding performance on both public datasets and FIT-RSFG, surpassing existing RSLMMs. We hope the FIT-RS dataset can enhance the relation comprehension capability of RSLMMs and provide a large-scale fine-grained data source for the remote sensing community. The dataset will be available at https://github.com/Luo-Z13/SkySenseGPT △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 30 pages, 5 figures, 19 tables, dataset and code see https://github.com/Luo-Z13/SkySenseGPT

arXiv:2406.09838 [pdf, other]

Vision-Language Models Meet Meteorology: Develo** Models for Extreme Weather Events Detection with Heatmaps

Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research: https://github.com/AlexJJJChen/Climate-Zoo. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09695 [pdf, other]

Machine learning-based Near-field Emitter Localization via Grouped Hybrid Analog and Digital Massive MIMO Receive Array

Authors: Yifan Li, Feng Shu, Jiatong Bai, Cunhua Pan, Yongpeng Wu, Yaoliang Song, Jiangzhou Wang

Abstract: A fully-digital massive MIMO receive array is promising to meet the high-resolution requirement of near-field (NF) emitter localization, but it also results in the significantly increasing of hardware costs and algorithm complexity. In order to meet the future demand for green communication while maintaining high performance, the grouped hybrid analog and digital (HAD) structure is proposed for NF… ▽ More A fully-digital massive MIMO receive array is promising to meet the high-resolution requirement of near-field (NF) emitter localization, but it also results in the significantly increasing of hardware costs and algorithm complexity. In order to meet the future demand for green communication while maintaining high performance, the grouped hybrid analog and digital (HAD) structure is proposed for NF DOA estimation, which divides the large-scale receive array into small-scale groups and each group contains several subarrays. Thus the NF direction-of-arrival (DOA) estimation problem is viewed as far-field (FF) within each group, and some existing methods such as MUSIC, Root-MUSIC, ESPRIT, etc., can be adopted. Then by angle calibration, a candidate position set is generated. To eliminate the phase ambiguity arising from the HAD structure and obtain the emitter position, two low-complexity clustering-based methods, minimum sample distance clustering (MSDC) and range scatter diagram (RSD) - angle scatter diagram (ASD)-based DBSCAN (RSD-ASD-DBSCAN), are proposed based on the distribution features of samples in the candidate position set. Then to further improve the localization accuracy, a model-driven regression network (RegNet) is designed, which consists of a multi-layer neural network (MLNN) for false solution elimination and a perceptron for angle fusion. Finally, the Cramer-Rao lower bound (CRLB) of NF emitter localization for the proposed grouped HAD structure is also derived. The simulation results show that the proposed methods can achieve CRLB at different SNR regions, the RegNet has great performance advantages at low SNR regions and the clustering-based methods have much lower complexity. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09694 [pdf, other]

An Efficient Approach to Regression Problems with Tensor Neural Networks

Authors: Yongxin Li

Abstract: This paper introduces a tensor neural network (TNN) to address nonparametric regression problems. Characterized by its distinct sub-network structure, the TNN effectively facilitates variable separation, thereby enhancing the approximation of complex, unknown functions. Our comparative analysis reveals that the TNN outperforms conventional Feed-Forward Networks (FFN) and Radial Basis Function Netw… ▽ More This paper introduces a tensor neural network (TNN) to address nonparametric regression problems. Characterized by its distinct sub-network structure, the TNN effectively facilitates variable separation, thereby enhancing the approximation of complex, unknown functions. Our comparative analysis reveals that the TNN outperforms conventional Feed-Forward Networks (FFN) and Radial Basis Function Networks (RBN) in terms of both approximation accuracy and generalization potential, despite a similar scale of parameters. A key innovation of our approach is the integration of statistical regression and numerical integration within the TNN framework. This integration allows for the efficient computation of high-dimensional integrals associated with the regression function. The implications of this advancement extend to a broader range of applications, particularly in scenarios demanding precise high-dimensional data analysis and prediction. △ Less

Submitted 13 June, 2024; originally announced June 2024.

MSC Class: 62J02; 68T05

arXiv:2406.09544 [pdf, other]

No more gap-shifting: Stochastic many-body-theory based TDHF for accurate theory of polymethine cyanine dyes

Authors: Nadine C. Bradbury, Barry Y. Li, Tucker Allen, Justin R. Caram, Daniel Neuhauser

Abstract: We introduce an individually fitted screened-exchange interaction for the time-dependent Hartree-Fock (TDHF) method and show that it resolves the missing binding energies in polymethine organic dye molecules compared to time-dependent density functional theory (TDDFT). The interaction kernel, which can be thought as a dielectric function, is generated by stochastic fitting to the screened-Coulomb… ▽ More We introduce an individually fitted screened-exchange interaction for the time-dependent Hartree-Fock (TDHF) method and show that it resolves the missing binding energies in polymethine organic dye molecules compared to time-dependent density functional theory (TDDFT). The interaction kernel, which can be thought as a dielectric function, is generated by stochastic fitting to the screened-Coulomb interaction of many-body perturbation theory (MBPT), specific to each system. We test our method on the flavylium (Flav) and indocyanine green (ICG) dye families with a modifiable length of the polymethine bridge, leading to excitations ranging from the visible to short-wave infrared (SWIR). Our approach validates earlier observations on the importance of inclusion of medium range exchange for the exciton binding energy. Our resulting method, TDHF@$v_W$, also achieves a mean absolute error on par with MBPT at a computational cost on par with local-functional TDDFT. △ Less

Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09423 [pdf, other]

MSz: An Efficient Parallel Algorithm for Correcting Morse-Smale Segmentations in Error-Bounded Lossy Compressors

Authors: Yuxiao Li, Xin Liang, Bei Wang, Yongfeng Qiu, Lin Yan, Hanqi Guo

Abstract: This research explores a novel paradigm for preserving topological segmentations in existing error-bounded lossy compressors. Today's lossy compressors rarely consider preserving topologies such as Morse-Smale complexes, and the discrepancies in topology between original and decompressed datasets could potentially result in erroneous interpretations or even incorrect scientific conclusions. In thi… ▽ More This research explores a novel paradigm for preserving topological segmentations in existing error-bounded lossy compressors. Today's lossy compressors rarely consider preserving topologies such as Morse-Smale complexes, and the discrepancies in topology between original and decompressed datasets could potentially result in erroneous interpretations or even incorrect scientific conclusions. In this paper, we focus on preserving Morse-Smale segmentations in 2D/3D piecewise linear scalar fields, targeting the precise reconstruction of minimum/maximum labels induced by the integral line of each vertex. The key is to derive a series of edits during compression time; the edits are applied to the decompressed data, leading to an accurate reconstruction of segmentations while kee** the error within the prescribed error bound. To this end, we developed a workflow to fix extrema and integral lines alternatively until convergence within finite iterations; we accelerate each workflow component with shared-memory/GPU parallelism to make the performance practical for coupling with compressors. We demonstrate use cases with fluid dynamics, ocean, and cosmology application datasets with a significant acceleration with an NVIDIA A100 GPU. △ Less

Submitted 5 April, 2024; originally announced June 2024.

arXiv:2406.09410 [pdf, other]

Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-reso… ▽ More Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, the lack of SGG datasets with large-size VHR SAI has constrained the advancement of SGG in SAI. Due to the complexity of large-size VHR SAI, mining triplets <subject, relationship, object> in large-size VHR SAI heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size VHR SAI. To address the scarcity of datasets, this paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named RSG, encompassing over 210,000 objects and more than 400,000 triplets. To realize SGG in large-size VHR SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI at three levels: object detection (OBD), pair pruning and relationship prediction. As a fundamental prerequisite for SGG in large-size SAI, a holistic multi-class object detection network (HOD-Net) that can flexibly integrate multi-scale contexts is proposed. With the consideration that there exist a huge amount of object pairs in large-size SAI but only a minority of object pairs contain meaningful relationships, we design a pair proposal generation (PPG) network via adversarial reconstruction to select high-value pairs. Furthermore, a relationship prediction network with context-aware messaging (RPCM) is proposed to predict the relationship types of these pairs. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: This paper releases a SAI-oriented SGG toolkit with about 30 OBD methods and 10 SGG methods, and develops a benchmark based on RSG where our HOD-Net and RPCM significantly outperform the state-of-the-art methods in both OBD and SGG tasks. The RSG dataset and SAI-oriented toolkit will be made publicly available at https://linlin-dev.github.io/project/RSG

arXiv:2406.09400 [pdf, other]

Yo'LLaVA: Your Personalized Language and Vision Assistant

Authors: Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

Abstract: Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in o… ▽ More Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in our surroundings. For example, one might ask, "What should I buy for my dog's birthday?"; as opposed to a generic inquiry about "What should I buy for a dog's birthday?". Similarly, when looking at a friend's image, the interest lies in seeing their activities (e.g., "my friend is holding a cat"), rather than merely observing generic human actions (e.g., "a man is holding a cat"). In this paper, we introduce the novel task of personalizing LMMs, so that they can have conversations about a specific subject. We propose Yo'LLaVA, which learns to embed a personalized subject into a set of latent tokens given a handful of example images of the subject. Our qualitative and quantitative analyses reveal that Yo'LLaVA can learn the concept more efficiently using fewer tokens and more effectively encode the visual attributes compared to strong prompting baselines (e.g., LLaVA). △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Project page: https://thaoshibe.github.io/YoLLaVA

arXiv:2406.09383 [pdf, other]

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capabilities. To bridge this gap, in collaboration with the self-driving company May Mobility, we present the MARS dataset which unifies scenarios that enable MultiAgent, multitraveRSal, and multimodal autonomous vehicle research. More specifically, MARS is collected with a fleet of autonomous vehicles driving within a certain geographical area. Each vehicle has its own route and different vehicles may appear at nearby locations. Each vehicle is equipped with a LiDAR and surround-view RGB cameras. We curate two subsets in MARS: one facilitates collaborative driving with multiple vehicles simultaneously present at the same location, and the other enables memory retrospection through asynchronous traversals of the same location by multiple vehicles. We conduct experiments in place recognition and neural reconstruction. More importantly, MARS introduces new research opportunities and challenges such as multitraversal 3D reconstruction, multiagent perception, and unsupervised object discovery. Our data and codes can be found at https://ai4ce.github.io/MARS/. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by CVPR 2024

arXiv:2406.09274 [pdf, other]

Doubled Shapiro steps in a dynamic axion insulator Josephson junction

Authors: Yu-Hang Li, Ziqian Zhou, Ran Cheng, Hua Jiang, X. C. Xie

Abstract: Dynamic axion insulators feature a time-dependent axion field that can be induced by antiferromagnetic resonance. Here, we show that a Josephson junction incorporating this dynamic axion insulator between two superconductors exhibits a striking doubled Shapiro steps wherein all odd steps are completely suppressed in the jointly presence of a DC bias and a static magnetic field. The resistively shu… ▽ More Dynamic axion insulators feature a time-dependent axion field that can be induced by antiferromagnetic resonance. Here, we show that a Josephson junction incorporating this dynamic axion insulator between two superconductors exhibits a striking doubled Shapiro steps wherein all odd steps are completely suppressed in the jointly presence of a DC bias and a static magnetic field. The resistively shunted junction simulation confirms that these doubled Shapiro steps originate from the distinctive axion electrodynamics driven by the antiferromagnetic resonance, which thus not only furnishes a hallmark to identify the dynamic axion insulator but also provides a method to evaluate its mass term. Furthermore, the experimentally feasible differential conductance is also determined. Our work holds significant importance in condensed matter physics and materials science for understanding the dynamic axion insulator, paving the way for its further exploration and applications. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09238 [pdf, other]

Near-Field Multiuser Communications based on Sparse Arrays

Authors: Kangjian Chen, Chenhao Qi, Geoffrey Ye Li, Octavia A. Dobre

Abstract: This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitiga… ▽ More This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitigate the high multiuser interference from the grating lobes of the USAs. To maximize the sum rate of near-field multiuser communications, we optimize the antenna positions of the NSAs, where a successive convex approximation-based antenna position optimization algorithm is proposed. Moreover, we find that the channels of both the USAs and the NSAs show uniform sparsity in the defined surrogate distance-angle (SD-A) domain. Based on the channel sparsity, an on-grid SD-A-domain orthogonal matching pursuit (SDA-OMP) algorithm is developed to estimate multiuser channels. To further improve the resolution of the SDA-OMP, we also design an off-grid SD-A-domain iterative super-resolution channel estimation algorithm. Simulation results demonstrate the superior performance of the proposed methods. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09187 [pdf, other]

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

Abstract: The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the f… ▽ More The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. Specifically, GuardAgent oversees a target LLM agent by checking whether its inputs/outputs satisfy a set of given guard requests defined by the users. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines. In both steps, an LLM is utilized as the core reasoning component, supplemented by in-context demonstrations retrieved from a memory module. Such knowledge-enabled reasoning allows GuardAgent to understand various textual guard requests and accurately "translate" them into executable code that provides reliable guardrails. Furthermore, GuardAgent is equipped with an extendable toolbox containing functions and APIs and requires no additional LLM training, which underscores its generalization capabilities and low operational overhead. Additionally, we propose two novel benchmarks: an EICU-AC benchmark for assessing privacy-related access control for healthcare agents and a Mind2Web-SC benchmark for safety evaluation for web agents. We show the effectiveness of GuardAgent on these two benchmarks with 98.7% and 90.0% accuracy in moderating invalid inputs and outputs for the two types of agents, respectively. We also show that GuardAgent is able to define novel functions in adaption to emergent LLM agents and guard requests, which underscores its strong generalization capabilities. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09057 [pdf, ps, other]

The $q$-Schur algebras in type $D$, I, fundamental multiplication formulas

Authors: Jie Du, Yiqiang Li, Zhaozhao Zhao

Abstract: By embedding the Hecke algebra $\check H_q$ of type $D$ into the Hecke algebra $H_{q,1}$ of type $B$ with unequal parameters $(q,1)$, the $q$-Schur algebras $S^κ_q(n,r)$ of type $D$ is naturally defined as the endomorphism algebra of the tensor space with the $\check H_q$-action restricted from the $H_{q,1}$-action that defines the $(q,1)$-Schur algebra $S^\jmath_{q,1}(n,r)$ of type $B$. We invest… ▽ More By embedding the Hecke algebra $\check H_q$ of type $D$ into the Hecke algebra $H_{q,1}$ of type $B$ with unequal parameters $(q,1)$, the $q$-Schur algebras $S^κ_q(n,r)$ of type $D$ is naturally defined as the endomorphism algebra of the tensor space with the $\check H_q$-action restricted from the $H_{q,1}$-action that defines the $(q,1)$-Schur algebra $S^\jmath_{q,1}(n,r)$ of type $B$. We investigate the algebras $S^\jmath_{q,1}(n,r)$ and $S^κ_q(n,r)$ both algebraically and geometrically and describe their standard bases, dimension formulas and weight idempotents. Most importantly, we use the geometrically derived two sets of the fundamental multiplication formulas in $S^\jmath_{q,1}(n,r)$ to derive multi-sets (9 sets in total!) of the fundamental multiplication formulas in $S^κ_q(n,r)$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 33 pages. Comments welcome

MSC Class: 16T20; 17B37; 20C08; 20C33; 20G43

arXiv:2406.09044 [pdf, other]

MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Authors: Hanqing Wang, Zeguan Xiao, Yixia Li, Shuo Wang, Guanhua Chen, Yun Chen

Abstract: Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while kee** the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned sub… ▽ More Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computation and memory cost. Previous LoRA-based approaches initialize the low-rank matrices with gaussian distribution and zero values, while kee** the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might have interference with the well-learned subspace of the pretrained weight matrix. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while kee** the principle singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principle matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principle matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the finetuning dataset. Extensive experiments on commonsense reasoning, math reasoning and instruction following benchmarks present the superior performance of our method. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08935 [pdf, ps, other]

doi 10.3847/1538-3881/ad47c4

Dense Outflowing Molecular Gas in Massive Star-forming Regions

Authors: Yani Xu, Junzhi Wang, Shu Liu, Juan Li, Yuqiang LI, Rui Luo, Chao Ou, Siqi Zheng, Yijia Liu

Abstract: Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample o… ▽ More Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample of 33 massive star-forming regions using the 10-m Submillimeter Telescope (SMT). With the spatial distribution of line wings of HCO$^+$ 3-2 and HCN 3-2, outflows are detected in 25 sources, resulting in a detection rate of 76$\%$. The optically thin H$^{13}$CN and H$^{13}$CO$^+$ 3-2 lines are used to identify line wings as outflows and estimate core mass. The mass $M_{out}$, momentum $P_{out}$, kinetic energy $E_{K}$, force $F_{out}$ and mass loss rate $\dot M_{out}$ of outflow and core mass, are obtained for each source. A sublinear tight correlation is found between the mass of dense molecular outflow and core mass, with an index of $\sim$ 0.8 and a correlation coefficient of 0.88. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 22 pages, 5 figures, 4 tables, accepted in AJ

Report number: ad47c4

arXiv:2406.08877 [pdf, other]

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

Authors: Yuan-Ming Li, Wei-** Huang, An-Lan Wang, Ling-An Zeng, **g-Ke Meng, Wei-Shi Zheng

Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal bound… ▽ More We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 33 pages, 9 figures

arXiv:2406.08875 [pdf, other]

doi 10.1145/3658230

NICER: A New and Improved Consumed Endurance and Recovery Metric to Quantify Muscle Fatigue of Mid-Air Interactions

Authors: Yi Li, Benjamin Tag, Shaozhang Dai, Robert Crowther, Tim Dwyer, Pourang Irani, Barrett Ens

Abstract: Natural gestures are crucial for mid-air interaction, but predicting and managing muscle fatigue is challenging. Existing torque-based models are limited in their ability to model above-shoulder interactions and to account for fatigue recovery. We introduce a new hybrid model, NICER, which combines a torque-based approach with a new term derived from the empirical measurement of muscle contraction… ▽ More Natural gestures are crucial for mid-air interaction, but predicting and managing muscle fatigue is challenging. Existing torque-based models are limited in their ability to model above-shoulder interactions and to account for fatigue recovery. We introduce a new hybrid model, NICER, which combines a torque-based approach with a new term derived from the empirical measurement of muscle contraction and a recovery factor to account for decreasing fatigue during rest. We evaluated NICER in a mid-air selection task using two interaction methods with different degrees of perceived fatigue. Results show that NICER can accurately model above-shoulder interactions as well as reflect fatigue recovery during rest periods. Moreover, both interaction methods show a stronger correlation with subjective fatigue measurement (r = 0.978/0.976) than a previous model, Cumulative Fatigue (r = 0.966/ 0.923), confirming that NICER is a powerful analytical tool to predict fatigue across a variety of gesture-based interactive applications. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08830 [pdf, other]

Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

Authors: Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han

Abstract: To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a dir… ▽ More To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a direct solution is to involve the existing incremental learning mechanisms into the on-device training framework. Unfortunately, such a manner cannot work well as those mechanisms usually introduce large additional computational cost to the network optimization process, which would inevitably exceed the memory capacity of the edge devices. To address this issue, this paper makes an early effort to propose a simple but effective edge-friendly incremental learning framework. Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting. Upon this finding, we further design a center-sensitive kernel optimization framework to largely alleviate the cost of the gradient computation and back-propagation. Besides, a dynamic channel element selection strategy is also proposed to facilitate a sparse orthogonal gradient projection for further reducing the optimization complexity, upon the knowledge explored from the new task data. Extensive experiments validate our method is efficient and effective, e.g., our method achieves average accuracy boost of 38.08% with even less memory and approximate computation compared to existing on-device training methods, indicating its significant potential for on-device incremental learning. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08790 [pdf, ps, other]

Direct generation of multi-photon hyperentanglement

Authors: Peng Zhao, Jia-Wei Ying, Meng-Ying Yang, Wei Zhong, Ming-Ming Du, Shu-Ting Shen, Yun-Xi Li, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

Abstract: Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperenta… ▽ More Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperentangled states. We present the protocols of direct production of three- and four-photon hyperentanglement and extend the approach to an arbitrary number of photons through a straightforward cascade of spontaneous parametric down-conversion (SPDC) sources. The generated multi-photon hyperentangled states are encoded in polarization-spatial modes and polarization-time bin degrees of freedom, respectively. Numerical calculation shows that if the average photon number $μ$ is set to 1, the down conversion efficiency is $7.6*10^{-6}$ and the repetition frequency of the laser is $10^9$ Hz, the number of the generation of three-photon and four-photon hyperentanglement after cascading can reach about $5.78*10^{-2}$ and $4.44*10^{-7}$ pairs per second, respectively. By eliminating the constraints of outcome post-selection, our protocols may represent important progresses for multi-photon hyperentangement generation and providing a pivotal role in future multi-party and high-capacity communication networks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08567 [pdf, other]

Highly-entangled stationary states from strong symmetries

Authors: Yahui Li, Frank Pollmann, Nicholas Read, Pablo Sala

Abstract: We find that the presence of strong non-Abelian conserved quantities can lead to highly entangled stationary states even for unital quantum channels. We derive exact expressions for the bipartite logarithmic negativity, Rényi negativities, and operator space entanglement for stationary states restricted to one symmetric subspace, with focus on the trivial subspace. We prove that these apply to ope… ▽ More We find that the presence of strong non-Abelian conserved quantities can lead to highly entangled stationary states even for unital quantum channels. We derive exact expressions for the bipartite logarithmic negativity, Rényi negativities, and operator space entanglement for stationary states restricted to one symmetric subspace, with focus on the trivial subspace. We prove that these apply to open quantum evolutions whose commutants, characterizing all strongly conserved quantities, correspond to either the universal envelo** algebra of a Lie algebra or to the Read-Saleur commutants. The latter provides an example of quantum fragmentation, whose dimension is exponentially large in system size. We find a general upper bound for all these quantities given by the logarithm of the dimension of the commutant on the smaller bipartition of the chain. As Abelian examples, we show that strong U($1$) symmetries and classical fragmentation lead to separable stationary states in any symmetric subspace. In contrast, for non-Abelian SU$(N)$ symmetries, both logarithmic and Rényi negativities scale logarithmically with system size. Finally, we prove that while Rényi negativities with $n>2$ scale logarithmically with system size, the logarithmic negativity (as well as generalized Rényi negativities with $n<2$) exhibits a volume law scaling for the Read-Saleur commutants. Our derivations rely on the commutant possessing a Hopf algebra structure in the limit of infinitely large systems, and hence also apply to finite groups and quantum groups. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 Pages, 7 figures. Comments are welcome

arXiv:2406.08547 [pdf, other]

AGN Feedback in Quiescent Galaxies at Cosmic Noon Traced by Ionized Gas Emission

Authors: Letizia Bugiani, Sirio Belli, Minjung Park, Rebecca L. Davies, J. Trevor Mendel, Benjamin D. Johnson, Amir H. Khoram, Chloë Benton, Andrea Cimatti, Charlie Conroy, Razieh Emami, Joel Leja, Yijia Li, Gabriel Maheson, Elijah P. Mathews, Rohan P. Naidu, Erica J. Nelson, Sandro Tacchella, Bryan A. Terrazas, Rainer Weinberger

Abstract: We analyze ionized gas emission lines in deep rest-frame optical spectra of 16 quiescent galaxies at redshift $1.7<z<3.5$ observed with JWST/NIRSpec by the Blue Jay survey. Robust detection of emission lines in $75\%$ of the sample indicates the presence of ongoing ionizing sources in this passive population. The H$α$ line luminosities confirm that the population is quiescent, with star formation… ▽ More We analyze ionized gas emission lines in deep rest-frame optical spectra of 16 quiescent galaxies at redshift $1.7<z<3.5$ observed with JWST/NIRSpec by the Blue Jay survey. Robust detection of emission lines in $75\%$ of the sample indicates the presence of ongoing ionizing sources in this passive population. The H$α$ line luminosities confirm that the population is quiescent, with star formation rates that are at least ten times lower than the main sequence of star formation. The quiescent sample is clearly separate from the star-forming population in line diagnostic diagrams, and occupies a region usually populated by active galactic nuclei (AGN). Analysis of the observed line ratios, equivalent widths, and velocity dispersions leads us to conclude that in most cases the gas is ionized by AGN activity, despite the lack of X-ray detections. A subset of the sample also hosts ionized and/or neutral outflows. Our results show, for the first time using a representative sample, that low luminosity AGN are extremely common among quiescent galaxies at high redshift. These low luminosity AGN may play a key role in quenching star formation and in maintaining massive galaxies quiescent from Cosmic Noon to $z\sim0$. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 20 pages, 18 figures. Submitted to ApJ

arXiv:2406.08500 [pdf, ps, other]

Newman's theorem via Carathéodory

Authors: Ali Mohammad Lavasani, Yaqiao Li, Mehran Shakerinava

Abstract: We give a streamlined short proof of Newman's theorem in communication complexity by applying the classical and the approximate Carathéodory's theorems. We give a streamlined short proof of Newman's theorem in communication complexity by applying the classical and the approximate Carathéodory's theorems. △ Less

Submitted 8 May, 2024; originally announced June 2024.

arXiv:2406.08481 [pdf, other]

Enhancing End-to-End Autonomous Driving with Latent World Model

Authors: Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, Tieniu Tan

Abstract: End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e… ▽ More End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08393 [pdf, other]

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Authors: Yue Li, Xinsheng Wang, Li Zhang, Lei Xie

Abstract: Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav… ▽ More Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is conducted in this work. Specifically, an SCD model, named SCDNet, is proposed. With this model, various state-of-the-art SSL models, including Hubert, wav2vec 2.0, and WavLm are investigated. To discern the most potent layer of SSL models for SCD, a learnable weighting method is employed to analyze the effectiveness of intermediate representations. Additionally, a fine-tuning-based approach is also implemented to further compare the characteristics of SSL models in the SCD task. Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet. Experiments showcase the superiority of WavLm in the SCD task and also demonstrate the good design of SCDNet. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08353 [pdf, other]

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

Authors: Yuanchao Li, Peter Bell, Catherine Lai

Abstract: Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a gap between in-lab research and real-world scenarios where Automatic Speech Recognition (ASR) serves as the text source. Hence, this study benchmarks SE… ▽ More Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a gap between in-lab research and real-world scenarios where Automatic Speech Recognition (ASR) serves as the text source. Hence, this study benchmarks SER performance using ASR transcripts with varying Word Error Rates (WERs) on well-known corpora: IEMOCAP, CMU-MOSI, and MSP-Podcast. Our evaluation includes text-only and bimodal SER with diverse fusion techniques, aiming for a comprehensive analysis that uncovers novel findings and challenges faced by current SER research. Additionally, we propose a unified ASR error-robust framework integrating ASR error correction and modality-gated fusion, achieving lower WER and higher SER results compared to the best-performing ASR transcript. This research is expected to provide insights into SER with ASR assistance, especially for real-world applications. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08343 [pdf, other]

Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by develo** a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.08324 [pdf, other]

LaMOT: Language-Guided Multi-Object Tracking

Authors: Yunhao Li, Xiaoqiong Liu, Luke Liu, Heng Fan, Libo Zhang

Abstract: Vision-Language MOT is a crucial tracking problem and has drawn increasing attention recently. It aims to track objects based on human language commands, replacing the traditional use of templates or pre-set information from training sets in conventional tracking tasks. Despite various efforts, a key challenge lies in the lack of a clear understanding of why language is used for tracking, which hi… ▽ More Vision-Language MOT is a crucial tracking problem and has drawn increasing attention recently. It aims to track objects based on human language commands, replacing the traditional use of templates or pre-set information from training sets in conventional tracking tasks. Despite various efforts, a key challenge lies in the lack of a clear understanding of why language is used for tracking, which hinders further development in this field. In this paper, we address this challenge by introducing Language-Guided MOT, a unified task framework, along with a corresponding large-scale benchmark, termed LaMOT, which encompasses diverse scenarios and language descriptions. Specially, LaMOT comprises 1,660 sequences from 4 different datasets and aims to unify various Vision-Language MOT tasks while providing a standardized evaluation platform. To ensure high-quality annotations, we manually assign appropriate descriptive texts to each target in every video and conduct careful inspection and correction. To the best of our knowledge, LaMOT is the first benchmark dedicated to Language-Guided MOT. Additionally, we propose a simple yet effective tracker, termed LaMOTer. By establishing a unified task framework, providing challenging benchmarks, and offering insights for future algorithm design and evaluation, we expect to contribute to the advancement of research in Vision-Language MOT. We will release the data at https://github.com/Nathan-Li123/LaMOT. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08225 [pdf, ps, other]

Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08122 [pdf]

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline ones in average accuracy with statistical significance. Code is at: https://github.com/YongjieSi/EDE. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

arXiv:2406.08119 [pdf]

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

arXiv:2406.08112 [pdf, other]

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skip** the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

arXiv:2406.07976 [pdf, other]

doi 10.1145/3637528.3671725

Multivariate Log-based Anomaly Detection for Distributed Database

Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Ying Li, Yong Yang, Zhonghai Wu

Abstract: Distributed databases are fundamental infrastructures of today's large-scale software systems such as cloud systems. Detecting anomalies in distributed databases is essential for maintaining software availability. Existing approaches, predominantly developed using Loghub-a comprehensive collection of log datasets from various systems-lack datasets specifically tailored to distributed databases, wh… ▽ More Distributed databases are fundamental infrastructures of today's large-scale software systems such as cloud systems. Detecting anomalies in distributed databases is essential for maintaining software availability. Existing approaches, predominantly developed using Loghub-a comprehensive collection of log datasets from various systems-lack datasets specifically tailored to distributed databases, which exhibit unique anomalies. Additionally, there's a notable absence of datasets encompassing multi-anomaly, multi-node logs. Consequently, models built upon these datasets, primarily designed for standalone systems, are inadequate for distributed databases, and the prevalent method of deeming an entire cluster anomalous based on irregularities in a single node leads to a high false-positive rate. This paper addresses the unique anomalies and multivariate nature of logs in distributed databases. We expose the first open-sourced, comprehensive dataset with multivariate logs from distributed databases. Utilizing this dataset, we conduct an extensive study to identify multiple database anomalies and to assess the effectiveness of state-of-the-art anomaly detection using multivariate log data. Our findings reveal that relying solely on logs from a single node is insufficient for accurate anomaly detection on distributed database. Leveraging these insights, we propose MultiLog, an innovative multivariate log-based anomaly detection approach tailored for distributed databases. Our experiments, based on this novel dataset, demonstrate MultiLog's superiority, outperforming existing state-of-the-art methods by approximately 12%. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by KDD'24

arXiv:2406.07929 [pdf, other]

A Generic Layer Pruning Method for Signal Modulation Recognition Deep Learning Models

Authors: Yao Lu, Yutao Zhu, Yuqi Li, Dongwei Xu, Yun Lin, Qi Xuan, Xiaoniu Yang

Abstract: With the successful application of deep learning in communications systems, deep neural networks are becoming the preferred method for signal classification. Although these models yield impressive results, they often come with high computational complexity and large model sizes, which hinders their practical deployment in communication systems. To address this challenge, we propose a novel layer p… ▽ More With the successful application of deep learning in communications systems, deep neural networks are becoming the preferred method for signal classification. Although these models yield impressive results, they often come with high computational complexity and large model sizes, which hinders their practical deployment in communication systems. To address this challenge, we propose a novel layer pruning method. Specifically, we decompose the model into several consecutive blocks, each containing consecutive layers with similar semantics. Then, we identify layers that need to be preserved within each block based on their contribution. Finally, we reassemble the pruned blocks and fine-tune the compact model. Extensive experiments on five datasets demonstrate the efficiency and effectiveness of our method over a variety of state-of-the-art baselines, including layer pruning and channel pruning methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07850 [pdf, other]

Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

Authors: Yiwei Li, Fei Mi, Yitong Li, Yasheng Wang, Bin Sun, Shaoxiong Feng, Kan Li

Abstract: Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires le… ▽ More Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires less randomness given that stochastic decoding strategy entails the risk of generating incorrect information. As a result, an adaptive and flexible decoding strategy is needed to cope with these two scenarios simultaneously. To this end, we propose the dynamic decoding strategy (DDS), which can adjust the decoding space w.r.t. different contexts. In DDS, both sequence-level and token-level adaptive search can be achieved to adjust the decoding process in a unified framework. Besides, our adaptive algorithm can not only be used during model inference, but it can also be applied during the model training stage to further enhance the performance. Comprehensive experiments indicate that the proposed decoding strategy can consistently improve the performance of pre-trained dialogue models when coupled with four well-used stochastic decoding algorithms. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ACL 2024 Findings

arXiv:2406.07625 [pdf, other]

Emergent Universal Quench Dynamics in Randomly Interacting Spin Models

Authors: Yuchen Li, Tian-Gang Zhou, Ze Wu, Pai Peng, Shengyu Zhang, Riqiang Fu, Ren Zhang, Wei Zheng, Pengfei Zhang, Hui Zhai, Xinhua Peng, Jiangfeng Du

Abstract: Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also… ▽ More Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures; Supplementary Information 26 pages, 11 figures, 2 tables

arXiv:2406.07573 [pdf, other]

Investigating the Potential of Using Large Language Models for Scheduling

Authors: Deddy Jobson, Yilin Li

Abstract: The inaugural ACM International Conference on AI-powered Software introduced the AIware Challenge, prompting researchers to explore AI-driven tools for optimizing conference programs through constrained optimization. We investigate the use of Large Language Models (LLMs) for program scheduling, focusing on zero-shot learning and integer programming to measure paper similarity. Our study reveals th… ▽ More The inaugural ACM International Conference on AI-powered Software introduced the AIware Challenge, prompting researchers to explore AI-driven tools for optimizing conference programs through constrained optimization. We investigate the use of Large Language Models (LLMs) for program scheduling, focusing on zero-shot learning and integer programming to measure paper similarity. Our study reveals that LLMs, even under zero-shot settings, create reasonably good first drafts of conference schedules. When clustering papers, using only titles as LLM inputs produces results closer to human categorization than using titles and abstracts with TFIDF. The code has been made publicly available. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.07520 [pdf, other]

Neural Gaffer: Relighting Any Object via Diffusion

Authors: Haian **, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, ** Sun, Noah Snavely

Abstract: Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BR… ▽ More Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BRDFs, which can be inaccurate or under-expressive. In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. Our method builds on a pre-trained diffusion model, and fine-tunes it on a synthetic relighting dataset, revealing and harnessing the inherent understanding of lighting present in the diffusion model. We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. Moreover, by combining with other generative methods, our model enables many downstream 2D tasks, such as text-based relighting and object insertion. Our model can also operate as a strong relighting prior for 3D tasks, such as relighting a radiance field. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Project Website: https://neural-gaffer.github.io

arXiv:2406.07477 [pdf, other]

Neutrino magnetic dipole portal with low energy neutrino nucleus scattering data

Authors: Ying-Ying Li, Yu-Feng Li, Shuo-Yu Xia

Abstract: Sterile neutrinos that couple to the Standard Model via the neutrino magnetic dipole portals have been extensively studied at various experiments. In this work, we scrutinize these interactions for sterile neutrinos in the mass range of $\unit[0.1]{}-\unit[50]{MeV}$ through the nuclear and electron recoils at various neutrino scattering experiments. For the $e$-flavor specific dipole portal, we de… ▽ More Sterile neutrinos that couple to the Standard Model via the neutrino magnetic dipole portals have been extensively studied at various experiments. In this work, we scrutinize these interactions for sterile neutrinos in the mass range of $\unit[0.1]{}-\unit[50]{MeV}$ through the nuclear and electron recoils at various neutrino scattering experiments. For the $e$-flavor specific dipole portal, we demonstrate that Dresden-II can provide leading constraints for $m_N \lesssim \unit[0.5]{MeV}$, setting aside currently unresolved theoretical uncertainties. For the $μ$-flavor case, we show that the COHERENT experiment can probe a unique parameter region for $m_N$ in the range of $\unit[10]{}-\unit[40]{MeV}$ with the full dataset collected by the CsI[Na] scintillation detector, including both the energy and timing structure of the neutrino beam. We also present limits on the parameter regions of the $τ$-flavor dipole portal using measurements of the solar neutrino flux from dark matter direct detection experiments. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 8 pages, 2 figures

arXiv:2406.07411 [pdf, other]

VersiCode: Towards Version-controllable Code Generation

Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, ** Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions. VersiCode encompasses 300 libraries across more than 2,000 versions spanning 9 years. We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE). Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode, that even state-of-the-art LLMs struggle to generate version-correct code. This dataset, together with the proposed tasks, sheds light on LLMs' capabilities and limitations in handling version-specific code generation, and opens up an important new area of research for further investigation. The resources can be found at https://github.com/wutong8023/VersiCode. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07394 [pdf, other]

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Authors: Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang

Abstract: This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic s… ▽ More This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications. △ Less

Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07272 [pdf, other]

Integrated Near Field Sensing and Communications Using Unitary Approximate Message Passing Based Matrix Factorization

Authors: Zhengdao Yuan, Qinghua Guo, Yonina C. Eldar, Yonghui Li

Abstract: Due to the utilization of large antenna arrays at base stations (BSs) and the operations of wireless communications in high frequency bands, mobile terminals often find themselves in the near-field of the array aperture. In this work, we address the signal processing challenges of integrated near-field localization and communication in uplink transmission of an integrated sensing and communication… ▽ More Due to the utilization of large antenna arrays at base stations (BSs) and the operations of wireless communications in high frequency bands, mobile terminals often find themselves in the near-field of the array aperture. In this work, we address the signal processing challenges of integrated near-field localization and communication in uplink transmission of an integrated sensing and communication (ISAC) system, where the BS performs joint near-field localization and signal detection (JNFLSD). We show that JNFLSD can be formulated as a matrix factorization (MF) problem with proper structures imposed on the factor matrices. Then, leveraging the variational inference (VI) and unitary approximate message passing (UAMP), we develop a low complexity Bayesian approach to MF, called UAMP-MF, to handle a generic MF problem. We then apply the UAMP-MF algorithm to solve the JNFLSD problem, where the factor matrix structures are fully exploited. Extensive simulation results are provided to demonstrate the superior performance of the proposed method. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 13 pages, 10 figures. arXiv admin note: text overlap with arXiv:2208.00422

Showing 201–250 of 16,652 results for author: Li, Y