Search | arXiv e-print repository

arXiv:2406.17181 [pdf, other]

FacePsy: An Open-Source Affective Mobile Sensing System -- Analyzing Facial Behavior and Head Gesture for Depression Detection in Naturalistic Settings

Authors: Rahul Islam, Sang Won Bae

Abstract: Depression, a prevalent and complex mental health issue affecting millions worldwide, presents significant challenges for detection and monitoring. While facial expressions have shown promise in laboratory settings for identifying depression, their potential in real-world applications remains largely unexplored due to the difficulties in develo** efficient mobile systems. In this study, we aim t… ▽ More Depression, a prevalent and complex mental health issue affecting millions worldwide, presents significant challenges for detection and monitoring. While facial expressions have shown promise in laboratory settings for identifying depression, their potential in real-world applications remains largely unexplored due to the difficulties in develo** efficient mobile systems. In this study, we aim to introduce FacePsy, an open-source mobile sensing system designed to capture affective inferences by analyzing sophisticated features and generating real-time data on facial behavior landmarks, eye movements, and head gestures -- all within the naturalistic context of smartphone usage with 25 participants. Through rigorous development, testing, and optimization, we identified eye-open states, head gestures, smile expressions, and specific Action Units (2, 6, 7, 12, 15, and 17) as significant indicators of depressive episodes (AUROC=81%). Our regression model predicting PHQ-9 scores achieved moderate accuracy, with a Mean Absolute Error of 3.08. Our findings offer valuable insights and implications for enhancing deployable and usable mobile affective sensing systems, ultimately improving mental health monitoring, prediction, and just-in-time adaptive interventions for researchers and developers in healthcare. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted to ACM International Conference on Mobile Human-Computer Interaction (MobileHCI 2024)

arXiv:2406.15942 [pdf, other]

Revolutionizing Mental Health Support: An Innovative Affective Mobile Framework for Dynamic, Proactive, and Context-Adaptive Conversational Agents

Authors: Rahul Islam, Sang Won Bae

Abstract: As we build towards develo** interactive systems that can recognize human emotional states and respond to individual needs more intuitively and empathetically in more personalized and context-aware computing time. This is especially important regarding mental health support, with a rising need for immediate, non-intrusive help tailored to each individual. Individual mental health and the complex… ▽ More As we build towards develo** interactive systems that can recognize human emotional states and respond to individual needs more intuitively and empathetically in more personalized and context-aware computing time. This is especially important regarding mental health support, with a rising need for immediate, non-intrusive help tailored to each individual. Individual mental health and the complex nature of human emotions call for novel approaches beyond conventional proactive and reactive-based chatbot approaches. In this position paper, we will explore how to create Chatbots that can sense, interpret, and intervene in emotional signals by combining real-time facial expression analysis, physiological signal interpretation, and language models. This is achieved by incorporating facial affect detection into existing practical and ubiquitous passive sensing contexts, thus empowering them with the capabilities to the ubiquity of sensing behavioral primitives to recognize, interpret, and respond to human emotions. In parallel, the system employs cognitive-behavioral therapy tools such as cognitive reframing and mood journals, leveraging the therapeutic intervention potential of Chatbots in mental health contexts. Finally, we propose a project to build a system that enhances the emotional understanding of Chatbots to engage users in chat-based intervention, thereby hel** manage their mood. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Accepted to Ubicomp '23, GenAI4PC Symposium

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.08010 [pdf, other]

A Self-boosted Framework for Calibrated Ranking

Authors: Shunyu Zhang, Hu Liu, Wentian Bao, Enyun Yu, Yang Song

Abstract: Scale-calibrated ranking systems are ubiquitous in real-world applications nowadays, which pursue accurate ranking quality and calibrated probabilistic predictions simultaneously. For instance, in the advertising ranking system, the predicted click-through rate (CTR) is utilized for ranking and required to be calibrated for the downstream cost-per-click ads bidding. Recently, multi-objective based… ▽ More Scale-calibrated ranking systems are ubiquitous in real-world applications nowadays, which pursue accurate ranking quality and calibrated probabilistic predictions simultaneously. For instance, in the advertising ranking system, the predicted click-through rate (CTR) is utilized for ranking and required to be calibrated for the downstream cost-per-click ads bidding. Recently, multi-objective based methods have been wildly adopted as a standard approach for Calibrated Ranking, which incorporates the combination of two loss functions: a pointwise loss that focuses on calibrated absolute values and a ranking loss that emphasizes relative orderings. However, when applied to industrial online applications, existing multi-objective CR approaches still suffer from two crucial limitations. First, previous methods need to aggregate the full candidate list within a single mini-batch to compute the ranking loss. Such aggregation strategy violates extensive data shuffling which has long been proven beneficial for preventing overfitting, and thus degrades the training effectiveness. Second, existing multi-objective methods apply the two inherently conflicting loss functions on a single probabilistic prediction, which results in a sub-optimal trade-off between calibration and ranking. To tackle the two limitations, we propose a Self-Boosted framework for Calibrated Ranking (SBCR). △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: KDD 2024

arXiv:2406.06858 [pdf, other]

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Authors: Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin **, Xin Liu

Abstract: Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation… ▽ More Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation to meet a certain latency requirement. However, this kind of parallelism introduces additional communication that might contribute a significant portion of overall runtime. Thus limits scalability of this technique within a group of devices with high speed interconnects, such as GPUs with NVLinks in a node. This paper proposes a novel method, Flux, to significantly hide communication latencies with dependent computations for GPUs. Flux over-decomposes communication and computation operations into much finer-grained operations and further fuses them into a larger kernel to effectively hide communication without compromising kernel efficiency. Flux can potentially overlap up to 96% of communication given a fused kernel. Overall, it can achieve up to 1.24x speedups for training over Megatron-LM on a cluster of 128 GPUs with various GPU generations and interconnects, and up to 1.66x and 1.30x speedups for prefill and decoding inference over vLLM on a cluster with 8 GPUs with various GPU generations and interconnects. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2404.18184 [pdf]

doi 10.23977/infse.2024.050217

Application and practice of AI technology in quantitative investment

Authors: Shuochen Bi, Wenqing Bao, Jue Xiao, Jiangshan Wang, Tingting Deng

Abstract: With the continuous development of artificial intelligence technology, using machine learning technology to predict market trends may no longer be out of reach. In recent years, artificial intelligence has become a research hotspot in the academic circle,and it has been widely used in image recognition, natural language processing and other fields, and also has a huge impact on the field of quanti… ▽ More With the continuous development of artificial intelligence technology, using machine learning technology to predict market trends may no longer be out of reach. In recent years, artificial intelligence has become a research hotspot in the academic circle,and it has been widely used in image recognition, natural language processing and other fields, and also has a huge impact on the field of quantitative investment. As an investment method to obtain stable returns through data analysis, model construction and program trading, quantitative investment is deeply loved by financial institutions and investors. At the same time, as an important application field of quantitative investment, the quantitative investment strategy based on artificial intelligence technology arises at the historic moment.How to apply artificial intelligence to quantitative investment, so as to better achieve profit and risk control, has also become the focus and difficulty of the research. From a global perspective, inflation in the US and the Federal Reserve are the concerns of investors, which to some extent affects the direction of global assets, including the Chinese stock market. This paper studies the application of AI technology, quantitative investment, and AI technology in quantitative investment, aiming to provide investors with auxiliary decision-making, reduce the difficulty of investment analysis, and help them to obtain higher returns. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 9 pages,2 figures

Journal ref: Information Systems and Economics (2024) Clausius Scientific Press, Canada , ISSN 2523-6407 Vol. 5 Num. 2

arXiv:2404.18183 [pdf]

doi 10.62051/IJGEM.v2n3.08

Innovative Application of Artificial Intelligence Technology in Bank Credit Risk Management

Authors: Shuochen Bi, Wenqing Bao

Abstract: With the rapid growth of technology, especially the widespread application of artificial intelligence (AI) technology, the risk management level of commercial banks is constantly reaching new heights. In the current wave of digitalization, AI has become a key driving force for the strategic transformation of financial institutions, especially the banking industry. For commercial banks, the stabili… ▽ More With the rapid growth of technology, especially the widespread application of artificial intelligence (AI) technology, the risk management level of commercial banks is constantly reaching new heights. In the current wave of digitalization, AI has become a key driving force for the strategic transformation of financial institutions, especially the banking industry. For commercial banks, the stability and safety of asset quality are crucial, which directly relates to the long-term stable growth of the bank. Among them, credit risk management is particularly core because it involves the flow of a large amount of funds and the accuracy of credit decisions. Therefore, establishing a scientific and effective credit risk decision-making mechanism is of great strategic significance for commercial banks. In this context, the innovative application of AI technology has brought revolutionary changes to bank credit risk management. Through deep learning and big data analysis, AI can accurately evaluate the credit status of borrowers, timely identify potential risks, and provide banks with more accurate and comprehensive credit decision support. At the same time, AI can also achieve realtime monitoring and early warning, hel** banks intervene before risks occur and reduce losses. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 6 pages, 1 figure, 2 tables

Journal ref: International Journal of Global Economics and Management ISSN: 3005-9690 (Print), ISSN: 3005-8090 (Online) | Volume 2, Number 3, Year 2024

arXiv:2404.18058 [pdf, other]

Joint Reference Frame Synthesis and Post Filter Enhancement for Versatile Video Coding

Authors: Weijie Bao, Yuantong Zhang, Jianghao Jia, Zhenzhong Chen, Shan Liu

Abstract: This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC), aiming to explore the combination of different neural network-based video coding (NNVC) tools to better utilize the hierarchical bi-directional coding structure of VVC. Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two i… ▽ More This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC), aiming to explore the combination of different neural network-based video coding (NNVC) tools to better utilize the hierarchical bi-directional coding structure of VVC. Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two input frames with artifacts and produces two enhanced frames with suppressed artifacts, along with an intermediate synthesized frame. STENet comprises two pipelines, the synthesis pipeline and the enhancement pipeline, tailored for different purposes. During RFS, two reconstructed frames are sent into STENet's synthesis pipeline to synthesize a virtual reference frame, similar to the current to-be-coded frame. The synthesized frame serves as an additional reference frame inserted into the reference picture list (RPL). During PFE, two reconstructed frames are fed into STENet's enhancement pipeline to alleviate their artifacts and distortions, resulting in enhanced frames with reduced artifacts and distortions. To reduce inference complexity, we propose joint inference of RFS and PFE (JISE), achieved through a single execution of STENet. Integrated into the VVC reference software VTM-15.0, RFS, PFE, and JISE are coordinated within a novel Space-Time Enhancement Window (STEW) under Random Access (RA) configuration. The proposed method could achieve -7.34%/-17.21%/-16.65% PSNR-based BD-rate on average for three components under RA configuration. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.14590 [pdf, other]

PupilSense: Detection of Depressive Episodes Through Pupillary Response in the Wild

Authors: Rahul Islam, Sang Won Bae

Abstract: Early detection of depressive episodes is crucial in managing mental health disorders such as Major Depressive Disorder (MDD) and Bipolar Disorder. However, existing methods often necessitate active participation or are confined to clinical settings. Addressing this gap, we introduce PupilSense, a novel, deep learning-driven mobile system designed to discreetly track pupillary responses as users i… ▽ More Early detection of depressive episodes is crucial in managing mental health disorders such as Major Depressive Disorder (MDD) and Bipolar Disorder. However, existing methods often necessitate active participation or are confined to clinical settings. Addressing this gap, we introduce PupilSense, a novel, deep learning-driven mobile system designed to discreetly track pupillary responses as users interact with their smartphones in their daily lives. This study presents a proof-of-concept exploration of PupilSense's capabilities, where we captured real-time pupillary data from users in naturalistic settings. Our findings indicate that PupilSense can effectively and passively monitor indicators of depressive episodes, offering a promising tool for continuous mental health assessment outside laboratory environments. This advancement heralds a significant step in leveraging ubiquitous mobile technology for proactive mental health care, potentially transforming how depressive episodes are detected and managed in everyday contexts. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 2024 International Conference on Activity and Behavior Computing

arXiv:2404.14563 [pdf, other]

Exploring Algorithmic Explainability: Generating Explainable AI Insights for Personalized Clinical Decision Support Focused on Cannabis Intoxication in Young Adults

Authors: Tongze Zhang, Tammy Chung, Anind Dey, Sang Won Bae

Abstract: This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, e… ▽ More This study explores the possibility of facilitating algorithmic decision-making by combining interpretable artificial intelligence (XAI) techniques with sensor data, with the aim of providing researchers and clinicians with personalized analyses of cannabis intoxication behavior. SHAP analyzes the importance and quantifies the impact of specific factors such as environmental noise or heart rate, enabling clinicians to pinpoint influential behaviors and environmental conditions. SkopeRules simplify the understanding of cannabis use for a specific activity or environmental use. Decision trees provide a clear visualization of how factors interact to influence cannabis consumption. Counterfactual models help identify key changes in behaviors or conditions that may alter cannabis use outcomes, to guide effective individualized intervention strategies. This multidimensional analytical approach not only unveils changes in behavioral and physiological states after cannabis use, such as frequent fluctuations in activity states, nontraditional sleep patterns, and specific use habits at different times and places, but also highlights the significance of individual differences in responses to cannabis use. These insights carry profound implications for clinicians seeking to gain a deeper understanding of the diverse needs of their patients and for tailoring precisely targeted intervention strategies. Furthermore, our findings highlight the pivotal role that XAI technologies could play in enhancing the transparency and interpretability of Clinical Decision Support Systems (CDSS), with a particular focus on substance misuse treatment. This research significantly contributes to ongoing initiatives aimed at advancing clinical practices that aim to prevent and reduce cannabis-related harms to health, positioning XAI as a supportive tool for clinicians and researchers alike. △ Less

Submitted 29 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 2024 International Conference on Activity and Behavior Computing

arXiv:2404.13748 [pdf, other]

Application of Kalman Filter in Stochastic Differential Equations

Authors: Wencheng Bao, Shi Feng, Kaiwen Zhang

Abstract: In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters,… ▽ More In areas such as finance, engineering, and science, we often face situations that change quickly and unpredictably. These situations are tough to handle and require special tools and methods capable of understanding and predicting what might happen next. Stochastic Differential Equations (SDEs) are renowned for modeling and analyzing real-world dynamical systems. However, obtaining the parameters, boundary conditions, and closed-form solutions of SDEs can often be challenging. In this paper, we will discuss the application of Kalman filtering theory to SDEs, including Extended Kalman filtering and Particle Extended Kalman filtering. We will explore how to fit existing SDE systems through filtering and track the original SDEs by fitting the obtained closed-form solutions. This approach aims to gather more information about these SDEs, which could be used in various ways, such as incorporating them into parameters of data-based SDE models. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 18 pages, 14 figures

arXiv:2404.05052 [pdf, other]

Facial Affective Behavior Analysis with Instruction Tuning

Authors: Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong

Abstract: Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual und… ▽ More Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual understanding tasks. However, directly harnessing MLLMs for FABA is challenging due to the scarcity of datasets and benchmarks, neglecting facial prior knowledge, and low training efficiency. To address these challenges, we introduce (i) an instruction-following dataset for two FABA tasks, e.g., emotion and action unit recognition, (ii) a benchmark FABA-Bench with a new metric considering both recognition and generation ability, and (iii) a new MLLM "EmoLA" as a strong baseline to the community. Our initiative on the dataset and benchmarks reveal the nature and rationale of facial affective behaviors, i.e., fine-grained facial movement, interpretability, and reasoning. Moreover, to build an effective and efficient FABA MLLM, we introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM. We conduct extensive experiments on FABA-Bench and four commonly-used FABA datasets. The results demonstrate that the proposed facial prior expert can boost the performance and EmoLA achieves the best results on our FABA-Bench. On commonly-used FABA datasets, EmoLA is competitive rivaling task-specific state-of-the-art models. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: V1.0

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.02478 [pdf, other]

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Authors: Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian

Abstract: Standard federated learning approaches suffer when client data distributions have sufficient heterogeneity. Recent methods addressed the client data heterogeneity issue via personalized federated learning (PFL) - a class of FL algorithms aiming to personalize learned global knowledge to better suit the clients' local data distributions. Existing PFL methods usually decouple global updates in deep… ▽ More Standard federated learning approaches suffer when client data distributions have sufficient heterogeneity. Recent methods addressed the client data heterogeneity issue via personalized federated learning (PFL) - a class of FL algorithms aiming to personalize learned global knowledge to better suit the clients' local data distributions. Existing PFL methods usually decouple global updates in deep neural networks by performing personalization on particular layers (i.e. classifier heads) and global aggregation for the rest of the network. However, preselecting network layers for personalization may result in suboptimal storage of global knowledge. In this work, we propose FedSelect, a novel PFL algorithm inspired by the iterative subnetwork discovery procedure used for the Lottery Ticket Hypothesis. FedSelect incrementally expands subnetworks to personalize client parameters, concurrently conducting global aggregations on the remaining parameters. This approach enables the personalization of both client parameters and subnetwork structure during the training process. Finally, we show that FedSelect outperforms recent state-of-the-art PFL algorithms under challenging client data heterogeneity settings and demonstrates robustness to various real-world distributional shifts. Our code is available at https://github.com/lapisrocks/fedselect. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Published in CVPR 2024

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.06204 [pdf, other]

Identifying and interpreting non-aligned human conceptual representations using language modeling

Authors: Wanqian Bao, Uri Hasson

Abstract: The question of whether people's experience in the world shapes conceptual representation and lexical semantics is longstanding. Word-association, feature-listing and similarity rating tasks aim to address this question but require a subjective interpretation of the latent dimensions identified. In this study, we introduce a supervised representational-alignment method that (i) determines whether… ▽ More The question of whether people's experience in the world shapes conceptual representation and lexical semantics is longstanding. Word-association, feature-listing and similarity rating tasks aim to address this question but require a subjective interpretation of the latent dimensions identified. In this study, we introduce a supervised representational-alignment method that (i) determines whether two groups of individuals share the same basis of a certain category, and (ii) explains in what respects they differ. In applying this method, we show that congenital blindness induces conceptual reorganization in both a-modal and sensory-related verbal domains, and we identify the associated semantic shifts. We first apply supervised feature-pruning to a language model (GloVe) to optimize prediction accuracy of human similarity judgments from word embeddings. Pruning identifies one subset of retained GloVe features that optimizes prediction of judgments made by sighted individuals and another subset that optimizes judgments made by blind. A linear probing analysis then interprets the latent semantics of these feature-subsets by learning a map** from the retained GloVe features to 65 interpretable semantic dimensions. We applied this approach to seven semantic domains, including verbs related to motion, sight, touch, and amodal verbs related to knowledge acquisition. We find that blind individuals more strongly associate social and cognitive meanings to verbs related to motion or those communicating non-speech vocal utterances (e.g., whimper, moan). Conversely, for amodal verbs, they demonstrate much sparser information. Finally, for some verbs, representations of blind and sighted are highly similar. The study presents a formal approach for studying interindividual differences in word meaning, and the first demonstration of how blindness impacts conceptual representation of everyday verbs. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: To appear at the ICLR 2024 Workshop on Representational Alignment (Re-Align)

arXiv:2403.02659 [pdf, other]

Quantum Advantage: A Single Qubit's Experimental Edge in Classical Data Storage

Authors: Chen Ding, Edwin Peter Lobo, Mir Alimuddin, Xiao-Yue Xu, Shuo Zhang, Manik Banik, Wan-Su Bao, He-Liang Huang

Abstract: We implement an experiment on a photonic quantum processor establishing efficacy of an elementary quantum system in classical information storage. The advantage is established by considering a class of simple bipartite games played with the communication resource qubit and classical bit (c-bit), respectively. Conventional wisdom, as articulated by the no-go theorems of Holevo and Frenkel-Weiner, s… ▽ More We implement an experiment on a photonic quantum processor establishing efficacy of an elementary quantum system in classical information storage. The advantage is established by considering a class of simple bipartite games played with the communication resource qubit and classical bit (c-bit), respectively. Conventional wisdom, as articulated by the no-go theorems of Holevo and Frenkel-Weiner, suggests that such a quantum advantage is unattainable in scenarios wherein sender and receiver possess shared randomness or classical correlation between them. Notably, the advantage we report is demonstrated in a scenario where participating players lack any form of shared randomness. Our experiment involves the development of a variational triangular polarimeter, enabling the realization of positive operator value measurements crucial for establishing the targeted quantum advantage. In addition to demonstrating a robust communication advantage of a single qubit our experiment also opens avenues for immediate applications in near-term quantum technologies. Furthermore, it constitutes a semi-device-independent non-classicality certification scheme for the quantum encoding-decoding apparatus, underscoring the broader implications of our work beyond its immediate technological applications. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02609 [pdf, other]

Search Intenion Network for Personalized Query Auto-Completion in E-Commerce

Authors: Wei Bao, Mi Zhang, Tao Zhang, Chengfu Huo

Abstract: Query Auto-Completion(QAC), as an important part of the modern search engine, plays a key role in complementing user queries and hel** them refine their search intentions.Today's QAC systems in real-world scenarios face two major challenges:1)intention equivocality(IE): during the user's ty** process,the prefix often contains a combination of characters and subwords, which makes the current in… ▽ More Query Auto-Completion(QAC), as an important part of the modern search engine, plays a key role in complementing user queries and hel** them refine their search intentions.Today's QAC systems in real-world scenarios face two major challenges:1)intention equivocality(IE): during the user's ty** process,the prefix often contains a combination of characters and subwords, which makes the current intention ambiguous and difficult to model.2)intention transfer (IT):previous works make personalized recommendations based on users' historical sequences, but ignore the search intention transfer.However, the current intention extracted from prefix may be contrary to the historical preferences. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.14241 [pdf, ps, other]

A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets

Authors: Chengzhang Yu, Xianjun Yang, Wenxia Bao, Shaonan Wang, Zhiming Yao

Abstract: In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust… ▽ More In environments where RGB images are inadequate, pressure maps is a viable alternative, garnering scholarly attention. This study introduces a novel self-supervised pressure map keypoint detection (SPMKD) method, addressing the current gap in specialized designs for human keypoint extraction from pressure maps. Central to our contribution is the Encoder-Fuser-Decoder (EFD) model, which is a robust framework that integrates a lightweight encoder for precise human keypoint detection, a fuser for efficient gradient propagation, and a decoder that transforms human keypoints into reconstructed pressure maps. This structure is further enhanced by the Classification-to-Regression Weight Transfer (CRWT) method, which fine-tunes accuracy through initial classification task training. This innovation not only enhances human keypoint generalization without manual annotations but also showcases remarkable efficiency and generalization, evidenced by a reduction to only $5.96\%$ in FLOPs and $1.11\%$ in parameter count compared to the baseline methods. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 5pages, 6figures

arXiv:2402.07818 [pdf, other]

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Authors: Z Liu, J Lou, W Bao, Y Hu, B Li, Z Qin, K Ren

Abstract: Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of D… ▽ More Fine-tuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs fine-tuning and its accompanying privacy concerns, differentially private (DP) fine-tuning of pretrained LLMs has been widely used to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM fine-tuning methods is the satisfactory tradeoff among privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based fine-tuning methods are unfortunately limited by the inherent inefficiency of SGD. In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method (DP-ZOSO) that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on fine-tuning trajectory. We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility (compared with DPZero, DP-ZOPO improves 4.5% on SST-5, 5.5% on MNLI with RoBERTa-Large and 9.2% on CB, 3.9% on BoolQ with OPT-2.7B when $ε=4$). △ Less

Submitted 9 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.13274 [pdf, other]

An energy-stable parametric finite element method for the planar Willmore flow

Authors: Weizhu Bao, Yifei Li

Abstract: We propose an energy-stable parametric finite element method (PFEM) for the planar Willmore flow and establish its unconditional energy stability of the full discretization scheme. The key lies in the introduction of two novel geometric identities to describe the planar Willmore flow: the first one involves the coupling of the outward unit normal vector $\boldsymbol{n}$ and the normal velocity… ▽ More We propose an energy-stable parametric finite element method (PFEM) for the planar Willmore flow and establish its unconditional energy stability of the full discretization scheme. The key lies in the introduction of two novel geometric identities to describe the planar Willmore flow: the first one involves the coupling of the outward unit normal vector $\boldsymbol{n}$ and the normal velocity $V$, and the second one concerns the time derivative of the mean curvature $κ$. Based on them, we derive a set of new geometric partial differential equations for the planar Willmore flow, leading to our new fully-discretized and unconditionally energy-stable PFEM. Our stability analysis is also based on the two new geometric identities. Extensive numerical experiments are provided to illustrate its efficiency and validate its unconditional energy stability. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.00207 [pdf, other]

A unified structure-preserving parametric finite element method for anisotropic surface diffusion

Authors: Weizhu Bao, Yifei Li

Abstract: We propose and analyze a unified structure-preserving parametric finite element method (SP-PFEM) for the anisotropic surface diffusion of curves in two dimensions $(d=2)$ and surfaces in three dimensions $(d=3)$ with an arbitrary anisotropic surface energy density $γ(\boldsymbol{n})$, where $\boldsymbol{n}\in \mathbb{S}^{d-1}$ represents the outward unit vector. By introducing a novel unified surf… ▽ More We propose and analyze a unified structure-preserving parametric finite element method (SP-PFEM) for the anisotropic surface diffusion of curves in two dimensions $(d=2)$ and surfaces in three dimensions $(d=3)$ with an arbitrary anisotropic surface energy density $γ(\boldsymbol{n})$, where $\boldsymbol{n}\in \mathbb{S}^{d-1}$ represents the outward unit vector. By introducing a novel unified surface energy matrix $\boldsymbol{G}_k(\boldsymbol{n})$ depending on $γ(\boldsymbol{n})$, the Cahn--Hoffman $\boldsymbolξ$-vector and a stabilizing function $k(\boldsymbol{n}):\ \mathbb{S}^{d-1}\to {\mathbb R}$, we obtain a unified and conservative variational formulation for the anisotropic surface diffusion via different surface differential operators including the surface gradient operator, the surface divergence operator and the surface Laplace--Beltrami operator. A SP-PFEM discretization is presented for the variational problem. In order to establish the unconditional energy stability of the proposed SP-PFEM under a very mild condition on $γ(\boldsymbol{n})$, we propose a new framework via {\sl local energy estimate} for proving energy stability/structure-preserving properties of the parametric finite element method for the anisotropic surface diffusion. This framework sheds light on how to prove unconditional energy stability of other numerical methods for geometric partial differential equations. Extensive numerical results are reported to demonstrate the efficiency and accuracy as well as structure-preserving properties of the proposed SP-PFEM for the anisotropic surface diffusion with arbitrary anisotropic surface energy density $γ(\boldsymbol{n})$ arising from different applications. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.13036 [pdf, other]

Quantum State Compression Shadow

Authors: Chen Ding, Xiao-Yue Xu, Shuo Zhang, Wan-Su Bao, He-Liang Huang

Abstract: Quantum state readout serves as the cornerstone of quantum information processing, exerting profound influence on quantum communication, computation, and metrology. In this study, we introduce an innovative readout architecture called Compression Shadow (CompShadow), which transforms the conventional readout paradigm by compressing multi-qubit states into single-qubit shadows before measurement. C… ▽ More Quantum state readout serves as the cornerstone of quantum information processing, exerting profound influence on quantum communication, computation, and metrology. In this study, we introduce an innovative readout architecture called Compression Shadow (CompShadow), which transforms the conventional readout paradigm by compressing multi-qubit states into single-qubit shadows before measurement. Compared to direct measurements of the initial quantum states, CompShadow achieves comparable accuracy in amplitude and observable expectation estimation while consuming similar measurement resources. Furthermore, its implementation on near-term quantum hardware with nearest-neighbor coupling architectures is straightforward. Significantly, CompShadow brings forth novel features, including the complete suppression of correlated readout noise, fundamentally reducing the quantum hardware demands for readout. It also facilitates the exploration of multi-body system properties through single-qubit probes and opens the door to designing quantum communication protocols with exponential loss suppression. Our findings mark the emergence of a new era in quantum state readout, setting the stage for a revolutionary leap in quantum information processing capabilities. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.08183 [pdf, other]

Circuit-Noise-Resilient Virtual Distillation

Authors: Xiao-Yue Xu, Chen Ding, Shuo Zhang, Wan-Su Bao, He-Liang Huang

Abstract: Quantum error mitigation (QEM) is crucial for near-term quantum devices, as noise inherently exists in physical quantum systems and undermines the accuracy of quantum algorithms. A typical purification-based QEM method, called Virtual Distillation (VD), aims to mitigate state preparation errors and achieve effective exponential suppression using multiple copies of the noisy state. However, imperfe… ▽ More Quantum error mitigation (QEM) is crucial for near-term quantum devices, as noise inherently exists in physical quantum systems and undermines the accuracy of quantum algorithms. A typical purification-based QEM method, called Virtual Distillation (VD), aims to mitigate state preparation errors and achieve effective exponential suppression using multiple copies of the noisy state. However, imperfect VD circuit implementation may yield negative mitigation outcomes, potentially more severe than those achieved without QEM. To address this, we introduce Circuit-Noise-Resilient Virtual Distillation (CNR-VD). This method, featuring a calibration procedure that utilizes easily-prepared input states, refines the outcomes of VD when its circuit is contaminated by noise, seeking to recover the results of an ideally conducted VD circuit. Simulation results demonstrate that the CNR-VD estimator effectively reduces deviations induced by noise in the VD circuit, showcasing improvements in accuracy by an order of magnitude at most compared to the original VD. Meanwhile, CNR-VD elevates the gate noise threshold for VD, enabling positive effects even in the presence of higher noise levels. Furthermore, the strength of our work lies in its applicability beyond specific QEM algorithms, as the estimator can also be applied to generic Hadamard-Test circuits. The proposed CNR-VD significantly enhances the noise-resilience of VD, and thus is anticipated to elevate the performance of quantum algorithm implementations on near-term quantum devices. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.05826 [pdf, other]

Honest Score Client Selection Scheme: Preventing Federated Learning Label Flip** Attacks in Non-IID Scenarios

Authors: Yanli Li, Huaming Chen, Wei Bao, Zhengmeng Xu, Dong Yuan

Abstract: Federated Learning (FL) is a promising technology that enables multiple actors to build a joint model without sharing their raw data. The distributed nature makes FL vulnerable to various poisoning attacks, including model poisoning attacks and data poisoning attacks. Today, many byzantine-resilient FL methods have been introduced to mitigate the model poisoning attack, while the effectiveness whe… ▽ More Federated Learning (FL) is a promising technology that enables multiple actors to build a joint model without sharing their raw data. The distributed nature makes FL vulnerable to various poisoning attacks, including model poisoning attacks and data poisoning attacks. Today, many byzantine-resilient FL methods have been introduced to mitigate the model poisoning attack, while the effectiveness when defending against data poisoning attacks still remains unclear. In this paper, we focus on the most representative data poisoning attack - "label flip** attack" and monitor its effectiveness when attacking the existing FL methods. The results show that the existing FL methods perform similarly in Independent and identically distributed (IID) settings but fail to maintain the model robustness in Non-IID settings. To mitigate the weaknesses of existing FL methods in Non-IID scenarios, we introduce the Honest Score Client Selection (HSCS) scheme and the corresponding HSCSFL framework. In the HSCSFL, The server collects a clean dataset for evaluation. Under each iteration, the server collects the gradients from clients and then perform HSCS to select aggregation candidates. The server first evaluates the performance of each class of the global model and generates the corresponding risk vector to indicate which class could be potentially attacked. Similarly, the server evaluates the client's model and records the performance of each class as the accuracy vector. The dot product of each client's accuracy vector and global risk vector is generated as the client's host score; only the top p\% host score clients are included in the following aggregation. Finally, server aggregates the gradients and uses the outcome to update the global model. The comprehensive experimental results show our HSCSFL effectively enhances the FL robustness and defends against the "label flip** attack." △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.02891 [pdf, other]

AdaFlood: Adaptive Flood Regularization

Authors: Wonho Bae, Yi Ren, Mohamad Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira

Abstract: Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization. Current approaches, however, apply the same constant flood level to all training samples, which inherently assumes all the samples have the same difficulty. We p… ▽ More Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization. Current approaches, however, apply the same constant flood level to all training samples, which inherently assumes all the samples have the same difficulty. We present AdaFlood, a novel flood regularization method that adapts the flood level of each training sample according to the difficulty of the sample. Intuitively, since training samples are not equal in difficulty, the target training loss should be conditioned on the instance. Experiments on datasets covering four diverse input modalities - text, images, asynchronous event sequences, and tabular - demonstrate the versatility of AdaFlood across data domains and noise levels. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02879 [pdf, other]

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling

Authors: Wonho Bae, **g Wang, Danica J. Sutherland

Abstract: Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learn… ▽ More Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.01295 [pdf, ps, other]

DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

Authors: Wenxuan Bao, Francesco Pittaluga, Vijay Kumar B G, Vincent Bindschaedler

Abstract: Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the l… ▽ More Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at https://github.com/wenxuan-Bao/DP-Mix. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 17 pages, 2 figures, to be published in Neural Information Processing Systems 2023

arXiv:2310.20181 [pdf, other]

An explicit and symmetric exponential wave integrator for the nonlinear Schrödinger equation with low regularity potential and nonlinearity

Authors: Weizhu Bao, Chushan Wang

Abstract: We propose and analyze a novel symmetric Gautschi-type exponential wave integrator (sEWI) for the nonlinear Schrödinger equation (NLSE) with low regularity potential and typical power-type nonlinearity of the form $ |ψ|^{2σ}ψ$ with $ ψ$ being the wave function and $ σ> 0 $ being the exponent of the nonlinearity. The sEWI is explicit and stable under a time step size restriction independent of the… ▽ More We propose and analyze a novel symmetric Gautschi-type exponential wave integrator (sEWI) for the nonlinear Schrödinger equation (NLSE) with low regularity potential and typical power-type nonlinearity of the form $ |ψ|^{2σ}ψ$ with $ ψ$ being the wave function and $ σ> 0 $ being the exponent of the nonlinearity. The sEWI is explicit and stable under a time step size restriction independent of the mesh size. We rigorously establish error estimates of the sEWI under various regularity assumptions on potential and nonlinearity. For ``good" potential and nonlinearity ($H^2$-potential and $σ\geq 1$), we establish an optimal second-order error bound in the $L^2$-norm. For low regularity potential and nonlinearity ($L^\infty$-potential and $σ> 0$), we obtain a first-order $L^2$-norm error bound accompanied with a uniform $H^2$-norm bound of the numerical solution. Moreover, adopting a new technique of \textit{regularity compensation oscillation} (RCO) to analyze error cancellation, for some non-resonant time steps, the optimal second-order $L^2$-norm error bound is proved under a weaker assumption on the nonlinearity: $σ\geq 1/2$. For all the cases, we also present corresponding fractional order error bounds in the $H^1$-norm, which is the natural norm in terms of energy. Extensive numerical results are reported to confirm our error estimates and to demonstrate the superiority of the sEWI, including much weaker regularity requirements on potential and nonlinearity, and excellent long-time behavior with near-conservation of mass and energy. △ Less

Submitted 26 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 28 pages, 6 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

arXiv:2310.20177 [pdf, other]

An extended Fourier pseudospectral method for the Gross-Pitaevskii equation with low regularity potential

Authors: Weizhu Bao, Bo Lin, Ying Ma, Chushan Wang

Abstract: We propose and analyze an extended Fourier pseudospectral (eFP) method for the spatial discretization of the Gross-Pitaevskii equation (GPE) with low regularity potential by treating the potential in an extended window for its discrete Fourier transform. The proposed eFP method maintains optimal convergence rates with respect to the regularity of the exact solution even if the potential is of low… ▽ More We propose and analyze an extended Fourier pseudospectral (eFP) method for the spatial discretization of the Gross-Pitaevskii equation (GPE) with low regularity potential by treating the potential in an extended window for its discrete Fourier transform. The proposed eFP method maintains optimal convergence rates with respect to the regularity of the exact solution even if the potential is of low regularity and enjoys similar computational cost as the standard Fourier pseudospectral method, and thus it is both efficient and accurate. Furthermore, similar to the Fourier spectral/pseudospectral methods, the eFP method can be easily coupled with different popular temporal integrators including finite difference methods, time-splitting methods and exponential-type integrators. Numerical results are presented to validate our optimal error estimates and to demonstrate that they are sharp as well as to show its efficiency in practical computations. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 20 pages, 7 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

arXiv:2310.18816 [pdf, other]

Adaptive Test-Time Personalization for Federated Learning

Authors: Wenxuan Bao, Tianxin Wei, Haohan Wang, **grui He

Abstract: Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a… ▽ More Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. While traditional test-time adaptation (TTA) can be used in this scenario, most of them inherently assume training data come from a single domain, while they come from multiple clients (source domains) with different distributions. Overlooking these domain interrelationships can result in suboptimal generalization. Moreover, most TTA algorithms are designed for a specific kind of distribution shift and lack the flexibility to handle multiple kinds of distribution shifts in FL. In this paper, we find that this lack of flexibility partially results from their pre-defining which modules to adapt in the model. To tackle this challenge, we propose a novel algorithm called ATP to adaptively learns the adaptation rates for each module in the model from distribution shifts among source domains. Theoretical analysis proves the strong generalization of ATP. Extensive experiments demonstrate its superiority in handling various distribution shifts including label shift, image corruptions, and domain shift, outperforming existing TTA methods across multiple datasets and model architectures. Our code is available at https://github.com/baowenxuan/ATP . △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.17082 [pdf, ps, other]

Does or did the supernova remnant Cassiopeia A operate as a PeVatron?

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;… ▽ More For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE; $E_γ\geq 100$~TeV) $γ$-rays. In this context, the historical SNR Cassiopeia A (Cas A) is considered one of the most promising target for UHE observations. This paper presents the observation of Cas A and its vicinity by the LHAASO KM2A detector. The exceptional sensitivity of LHAASO KM2A in the UHE band, combined with the young age of Cas A, enabled us to derive stringent model-independent limits on the energy budget of UHE protons and nuclei accelerated by Cas A at any epoch after the explosion. The results challenge the prevailing paradigm that Cas A-type SNRs are major suppliers of PeV CRs in the Milky Way. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 3 figures, Accepted by the APJL

arXiv:2310.15396 [pdf]

Non-destructive characterization techniques for battery performance and lifecycle assessment

Authors: Charlotte Gervillie-Mouravieff, Wurigumula Bao, Daniel A Steingart, Ying Shirley-Meng

Abstract: As global energy demands escalate, and the use of non-renewable resources become untenable, renewable resources and electric vehicles require far better batteries to stabilize the new energy landscape. To maximize battery performance and lifetime, understanding and monitoring the fundamental mechanisms that govern their operation throughout their life cycle is crucial. Unfortunately, from the mome… ▽ More As global energy demands escalate, and the use of non-renewable resources become untenable, renewable resources and electric vehicles require far better batteries to stabilize the new energy landscape. To maximize battery performance and lifetime, understanding and monitoring the fundamental mechanisms that govern their operation throughout their life cycle is crucial. Unfortunately, from the moment batteries are sealed until their end-of-life, they remain a black box, and our current knowledge of a commercial battery s health status is limited to current (I), voltage (V), temperature (T), and impedance (R) measurements, at the cell or even module level during use. Electrochemical models work best when the battery is new, and as state reckoning drifts leading to an over-reliance on insufficient data to establish conservative safety margins resulting in the systematic under-utilization of cells and batteries. While the field of operando characterization is not new, the emergence of techniques capable of tracking commercial battery properties under realistic conditions has unlocked a trove of chemical, thermal, and mechanical data that has the potential to revolutionize the development and utilization strategies of both new and used lithium-ion devices. In this review, we examine the latest advances in non-destructive operando characterization techniques, including electrical sensors, optical fibers, acoustic transducers, X-ray-based imaging and thermal imaging (IR camera or calorimetry), and their potential to improve our comprehension of degradation mechanisms, reduce time and cost, and enhance battery performance throughout its life cycle. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.10262 [pdf, other]

Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Authors: Natalia Flechas Manrique, Wanqian Bao, Aurelie Herbelot, Uri Hasson

Abstract: Interpretability methods in NLP aim to provide insights into the semantics underlying specific system architectures. Focusing on word embeddings, we present a supervised-learning method that, for a given domain (e.g., sports, professions), identifies a subset of model features that strongly improve prediction of human similarity judgments. We show this method keeps only 20-40% of the original embe… ▽ More Interpretability methods in NLP aim to provide insights into the semantics underlying specific system architectures. Focusing on word embeddings, we present a supervised-learning method that, for a given domain (e.g., sports, professions), identifies a subset of model features that strongly improve prediction of human similarity judgments. We show this method keeps only 20-40% of the original embeddings, for 8 independent semantic domains, and that it retains different feature sets across domains. We then present two approaches for interpreting the semantics of the retained features. The first obtains the scores of the domain words (co-hyponyms) on the first principal component of the retained embeddings, and extracts terms whose co-occurrence with the co-hyponyms tracks these scores' profile. This analysis reveals that humans differentiate e.g. sports based on how gender-inclusive and international they are. The second approach uses the retained sets as variables in a probing task that predicts values along 65 semantically annotated dimensions for a dataset of 535 words. The features retained for professions are best at predicting cognitive, emotional and social dimensions, whereas features retained for fruits or vegetables best predict the gustation (taste) dimension. We discuss implications for alignment between AI systems and human knowledge. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted for presentation at the BlackboxNLP workshop at EMNLP 2023

arXiv:2310.08845 [pdf, other]

doi 10.1126/sciadv.adj2778

Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A

Authors: Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t… ▽ More The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the trigger. The intrinsic energy spectrum of gamma-rays can be described by a power-law after correcting for extragalactic background light (EBL) absorption. Such a hard spectrum challenges the synchrotron self-Compton (SSC) scenario of relativistic electrons for the afterglow emission above several TeV. Observations of gamma-rays up to 13 TeV from a source with a measured redshift of z=0.151 hints more transparency in intergalactic space than previously expected. Alternatively, one may invoke new physics such as Lorentz Invariance Violation (LIV) or an axion origin of very high energy (VHE) signals. △ Less

Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 49pages, 11figures

Journal ref: Science Advances, 9, eadj2778 (2023) 15 November 2023

arXiv:2310.07236 [pdf, other]

AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

Authors: Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang, Helen Meng

Abstract: Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to… ▽ More Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to the lack of vividness. In this work, we propose AdaMesh, a novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference video of about 10 seconds and generates vivid facial expressions and head poses. Specifically, we propose mixture-of-low-rank adaptation (MoLoRA) to fine-tune the expression adapter, which efficiently captures the facial expression style. For the personalized pose style, we propose a pose adapter by building a discrete pose prior and retrieving the appropriate style embedding with a semantic-aware pose style matrix without fine-tuning. Extensive experimental results show that our approach outperforms state-of-the-art methods, preserves the talking style in the reference video, and generates vivid facial animation. The supplementary video and code will be available at https://adamesh.github.io. △ Less

Submitted 19 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: Project Page: https://adamesh.github.io

arXiv:2310.05394 [pdf, other]

doi 10.1002/aisy.202300885

CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

Authors: Gang Xu, Shuhao Wang, Lingyu Zhao, Xiao Chen, Tongwei Wang, Lang Wang, Zhenwei Luo, Dahan Wang, Zewen Zhang, Aijun Liu, Wei Ba, Zhigang Song, Huaiyin Shi, Dingrong Zhong, Jianpeng Ma

Abstract: Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform… ▽ More Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications. △ Less

Submitted 25 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: 41 pages, 13 figures, published in Advanced Intelligent Systems

arXiv:2310.02447 [pdf, other]

Machine learning assist nyc subway navigation safer and faster

Authors: Wencheng Bao, Shi Feng

Abstract: Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learnin… ▽ More Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learning to derive safety coefficients, employing methodologies such as generalized linear models, linear regression, and recurrent neural networks. Our evaluation will be based on the Root Mean Square Error (RMSE) across various subway stations, hel** us identify the most accurate model for safety coefficient estimation. Furthermore, we'll conduct a comprehensive review of different shortest-path algorithms, assessing them based on time complexity and real-world data to determine their appropriateness in merging both safety and time efficiency. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures

arXiv:2309.10711 [pdf, other]

Latent Space Energy-based Model for Fine-grained Open Set Recognition

Authors: Wentao Bao, Qi Yu, Yu Kong

Abstract: Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks.… ▽ More Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks. However, most existing EBMs suffer from density estimation in high-dimensional space, which is critical to recognizing images from fine-grained classes. In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world. Specifically, based on the latent space EBM, we propose an attribute-aware information bottleneck (AIB), a residual attribute feature aggregation (RAFA) module, and an uncertainty-based virtual outlier synthesis (UVOS) module to improve the expressivity, granularity, and density of the samples in fine-grained classes, respectively. Our method is flexible to take advantage of recent vision transformers for powerful visual classification and generation. The method is validated on both fine-grained and general visual classification datasets while preserving the capability of generating photo-realistic fake images with high resolution. △ Less

Submitted 29 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Add ack

arXiv:2309.09887 [pdf, other]

On Model Explanations with Transferable Neural Pathways

Authors: Xinmiao Lin, Wentao Bao, Qi Yu, Yu Kong

Abstract: Neural pathways as model explanations consist of a sparse set of neurons that provide the same level of prediction performance as the whole model. Existing methods primarily focus on accuracy and sparsity but the generated pathways may offer limited interpretability thus fall short in explaining the model behavior. In this paper, we suggest two interpretability criteria of neural pathways: (i) sam… ▽ More Neural pathways as model explanations consist of a sparse set of neurons that provide the same level of prediction performance as the whole model. Existing methods primarily focus on accuracy and sparsity but the generated pathways may offer limited interpretability thus fall short in explaining the model behavior. In this paper, we suggest two interpretability criteria of neural pathways: (i) same-class neural pathways should primarily consist of class-relevant neurons; (ii) each instance's neural pathway sparsity should be optimally determined. To this end, we propose a Generative Class-relevant Neural Pathway (GEN-CNP) model that learns to predict the neural pathways from the target model's feature maps. We propose to learn class-relevant information from features of deep and shallow layers such that same-class neural pathways exhibit high similarity. We further impose a faithfulness criterion for GEN-CNP to generate pathways with instance-specific sparsity. We propose to transfer the class-relevant neural pathways to explain samples of the same class and show experimentally and qualitatively their faithfulness and interpretability. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Arxiv preprint

arXiv:2308.15208 [pdf, other]

doi 10.1088/1748-0221/18/09/P09002

Optimization of WLS fiber readout for the HERD calorimeter

Authors: X. Liu, Z. Quan, Y. W. Dong, M. Xu, J. J. Wang, R. J. Wang, Z. G. Wang, X. Z. Cui, T. W. Bao, C. L. Liao, J. F. Han, Y. Chen

Abstract: A novel 3-D calorimeter, composed of about 7500 LYSO cubes, is the key and crucial detector of the High Energy cosmic-Radiation Detection (HERD) facility to be installed onboard the China Space Station. Energy deposition from cosmic ray in each LYSO cube is translated by multiple wavelength shifting (WLS) fibers for multi-range data acquisition and real-time triggering. In this study, various me… ▽ More A novel 3-D calorimeter, composed of about 7500 LYSO cubes, is the key and crucial detector of the High Energy cosmic-Radiation Detection (HERD) facility to be installed onboard the China Space Station. Energy deposition from cosmic ray in each LYSO cube is translated by multiple wavelength shifting (WLS) fibers for multi-range data acquisition and real-time triggering. In this study, various methods of surface finish and encapsulation of the LYSO cube were investigated to optimize the amplitude from the WLS fiber end with the aim of improving the signal-to-noise ratio of Intensified scientific CMOS (IsCMOS) collection. The LYSO cube with five rough surfaces and a specular reflector achieves the maximum amplitude at the low-range fiber end, which is increased by roughly 44% compared to the polished cube with PTFE wrap**. The non-uniformity of amplitude at different positions on the LYSO cube surface was measured by X-ray and the positional correlation factor was derived for the entire cube. A simulation based on HERD CALO was conducted, which revealed that both the LYSO cube with five rough surfaces and the cube with rough bottom face exhibit superior energy resolution for electrons compared to the other two configurations. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.15089 [pdf, other]

doi 10.1142/S0218202524500155

Optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and nonlinearity

Authors: Weizhu Bao, Ying Ma, Chushan Wang

Abstract: We establish optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and typical power-type nonlinearity $ f(ρ) = ρ^σ$, where $ ρ:=|ψ|^2 $ is the density with $ ψ$ the wave function and $ σ> 0 $ the exponent of the nonlinearity. For the first-order Lie-Trotter time-splitting method, optimal $ L^2 $-norm error bound is proved for… ▽ More We establish optimal error bounds on time-splitting methods for the nonlinear Schrödinger equation with low regularity potential and typical power-type nonlinearity $ f(ρ) = ρ^σ$, where $ ρ:=|ψ|^2 $ is the density with $ ψ$ the wave function and $ σ> 0 $ the exponent of the nonlinearity. For the first-order Lie-Trotter time-splitting method, optimal $ L^2 $-norm error bound is proved for $L^\infty$-potential and $ σ> 0 $, and optimal $H^1$-norm error bound is obtained for $ W^{1, 4} $-potential and $ σ\geq 1/2 $. For the second-order Strang time-splitting method, optimal $ L^2 $-norm error bound is established for $H^2$-potential and $ σ\geq 1 $, and optimal $H^1$-norm error bound is proved for $H^3$-potential and $ σ\geq 3/2 $ (or $σ= 1$). Compared to those error estimates of time-splitting methods in the literature, our optimal error bounds either improve the convergence rates under the same regularity assumptions or significantly relax the regularity requirements on potential and nonlinearity for optimal convergence orders. A key ingredient in our proof is to adopt a new technique called \textit{regularity compensation oscillation} (RCO), where low frequency modes are analyzed by phase cancellation, and high frequency modes are estimated by regularity of the solution. Extensive numerical results are reported to confirm our error estimates and to demonstrate that they are sharp. △ Less

Submitted 7 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 34 pages, 8 figures

MSC Class: 35Q55; 65M15; 65M70; 81Q05

Journal ref: Math. Models Methods Appl. Sci., Vol. 34 (2024), pp. 803-844

arXiv:2308.13781 [pdf, other]

doi 10.3847/1538-4357/acebce

Observation of gamma rays up to 320 TeV from the middle-aged TeV pulsar wind nebula HESS J1849$-$000

Authors: M. Amenomori, S. Asano, Y. W. Bao, X. J. Bi, D. Chen, T. L. Chen, W. Y. Chen, Xu Chen, Y. Chen, Cirennima, S. W. Cui, Danzengluobu, L. K. Ding, J. H. Fang, K. Fang, C. F. Feng, Zhaoyang Feng, Z. Y. Feng, Qi Gao, A. Gomi, Q. B. Gou, Y. Q. Guo, Y. Y. Guo, Y. Hayashi, H. H. He , et al. (93 additional authors not shown)

Abstract: Gamma rays from HESS J1849$-$000, a middle-aged TeV pulsar wind nebula (PWN), are observed by the Tibet air shower array and the muon detector array. The detection significance of gamma rays reaches $4.0\, σ$ and $4.4\, σ$ levels above 25 TeV and 100 TeV, respectively, in units of Gaussian standard deviation $σ$. The energy spectrum measured between $40\, {\rm TeV} < E < 320\, {\rm TeV}$ for the f… ▽ More Gamma rays from HESS J1849$-$000, a middle-aged TeV pulsar wind nebula (PWN), are observed by the Tibet air shower array and the muon detector array. The detection significance of gamma rays reaches $4.0\, σ$ and $4.4\, σ$ levels above 25 TeV and 100 TeV, respectively, in units of Gaussian standard deviation $σ$. The energy spectrum measured between $40\, {\rm TeV} < E < 320\, {\rm TeV}$ for the first time is described with a simple power-law function of ${\rm d}N/{\rm d}E = (2.86 \pm 1.44) \times 10^{-16}(E/40\, {\rm TeV})^{-2.24 \pm 0.41}\, {\rm TeV}^{-1}\, {\rm cm}^{-2}\, {\rm s}^{-1}$. The gamma-ray energy spectrum from the sub-TeV ($E < 1\, {\rm TeV}$) to sub-PeV ($100\, {\rm TeV} < E < 1\, {\rm PeV}$) ranges including the results of previous studies can be modeled with the leptonic scenario, inverse Compton scattering by high-energy electrons accelerated by the PWN of PSR J1849$-$0001. On the other hand, the gamma-ray energy spectrum can also be modeled with the hadronic scenario in which gamma rays are generated from the decay of neutral pions produced by collisions between accelerated cosmic-ray protons and the ambient molecular cloud found in the gamma-ray emitting region. The cutoff energy of cosmic-ray protons $E_{\rm p\, cut}$, cut is estimated at ${\rm log}_{10}(E_{\rm p,\, cut}/{\rm TeV}) = 3.73^{+2.98}_{-0.66}$, suggesting that protons are accelerated up to the PeV energy range. Our study thus proposes that HESS J1849$-$000 should be further investigated as a new candidate for a Galactic PeV cosmic-ray accelerator, PeVatron. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 10 pages, 2 figures, Accepted for publication from the Astrophysical Journal

arXiv:2308.13780 [pdf, other]

doi 10.3847/1538-4357/ac6ef4

Measurement of the Gamma-Ray Energy Spectrum beyond 100 TeV from the HESS J1843$-$033 Region

Authors: M. Amenomori, S. Asano, Y. W. Bao, X. J. Bi, D. Chen, T. L. Chen, W. Y. Chen, Xu Chen, Y. Chen, Cirennima, S. W. Cui, Danzengluobu, L. K. Ding, J. H. Fang, K. Fang, C. F. Feng, Zhaoyang Feng, Z. Y. Feng, Qi Gao, A. Gomi, Q. B. Gou, Y. Q. Guo, Y. Y. Guo, H. H. He, Z. T. He , et al. (91 additional authors not shown)

Abstract: HESS J1843$-$033 is a very-high-energy gamma-ray source whose origin remains unidentified. This work presents, for the first time, the energy spectrum of gamma rays beyond $100\, {\rm TeV}$ from the HESS J1843$-$033 region using the data recorded by the Tibet air shower array and its underground muon detector array. A gamma-ray source with an extension of $0.34^{\circ} \pm 0.12^{\circ}$ is success… ▽ More HESS J1843$-$033 is a very-high-energy gamma-ray source whose origin remains unidentified. This work presents, for the first time, the energy spectrum of gamma rays beyond $100\, {\rm TeV}$ from the HESS J1843$-$033 region using the data recorded by the Tibet air shower array and its underground muon detector array. A gamma-ray source with an extension of $0.34^{\circ} \pm 0.12^{\circ}$ is successfully detected above $25\, {\rm TeV}$ at $(α,\, δ) = (281.09^{\circ}\pm 0.10^{\circ},\, -3.76^{\circ}\pm 0.09^{\circ})$ near HESS J1843$-$033 with a statistical significance of $6.2\, σ$, and the source is named TASG J1844$-$038. The position of TASG J1844$-$038 is consistent with those of HESS J1843$-$033, eHWC J1842$-$035, and LHAASO J1843$-$0338. The measured gamma-ray energy spectrum in $25\, {\rm TeV} < E < 130\, {\rm TeV}$ is described with ${\rm d}N/{\rm d}E = (9.70\pm 1.89)\times 10^{-16} (E/40\, {\rm TeV})^{-3.26\pm 0.30}\, {\rm TeV}^{-1} {\rm cm}^{-2} {\rm s}^{-1}$, and the spectral fit to the combined spectra of HESS J1843$-$033, LHAASO J1843$-$0338, and TASG J1844$-$038 implies the existence of a cutoff at $49.5\pm 9.0\, {\rm TeV}$. Associations of TASG J1844-038 with SNR G28.6$-$0.1 and PSR J1844-0346 are also discussed in detail for the first time. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures, 1 table

arXiv:2308.13273 [pdf, other]

Bridging the Gap: Fine-to-Coarse Sketch Interpolation Network for High-Quality Animation Sketch Inbetweening

Authors: Jiaming Shen, Kun Hu, Wei Bao, Chang Wen Chen, Zhiyong Wang

Abstract: The 2D animation workflow is typically initiated with the creation of keyframes using sketch-based drawing. Subsequent inbetweens (i.e., intermediate sketch frames) are crafted through manual interpolation for smooth animations, which is a labor-intensive process. Thus, the prospect of automatic animation sketch interpolation has become highly appealing. However, existing video interpolation metho… ▽ More The 2D animation workflow is typically initiated with the creation of keyframes using sketch-based drawing. Subsequent inbetweens (i.e., intermediate sketch frames) are crafted through manual interpolation for smooth animations, which is a labor-intensive process. Thus, the prospect of automatic animation sketch interpolation has become highly appealing. However, existing video interpolation methods are generally hindered by two key issues for sketch inbetweening: 1) limited texture and colour details in sketches, and 2) exaggerated alterations between two sketch keyframes. To overcome these issues, we propose a novel deep learning method, namely Fine-to-Coarse Sketch Interpolation Network (FC-SIN). This approach incorporates multi-level guidance that formulates region-level correspondence, sketch-level correspondence and pixel-level dynamics. A multi-stream U-Transformer is then devised to characterize sketch inbewteening patterns using these multi-level guides through the integration of both self-attention and cross-attention mechanisms. Additionally, to facilitate future research on animation sketch inbetweening, we constructed a large-scale dataset - STD-12K, comprising 30 sketch animation series in diverse artistic styles. Comprehensive experiments on this dataset convincingly show that our proposed FC-SIN surpasses the state-of-the-art interpolation methods. Our code and dataset will be publicly available. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 7pages,6figures

arXiv:2308.11373 [pdf, other]

Fast and Adaptive Multi-agent Planning under Collaborative Temporal Logic Tasks via Poset Products

Authors: Zesen Liu, Meng Guo, Weimin Bao, Zhongkui Li

Abstract: Efficient coordination and planning is essential for large-scale multi-agent systems that collaborate in a shared dynamic environment. Heuristic search methods or learning-based approaches often lack the guarantee on correctness and performance. Moreover, when the collaborative tasks contain both spatial and temporal requirements, e.g., as Linear Temporal Logic (LTL) formulas, formal methods provi… ▽ More Efficient coordination and planning is essential for large-scale multi-agent systems that collaborate in a shared dynamic environment. Heuristic search methods or learning-based approaches often lack the guarantee on correctness and performance. Moreover, when the collaborative tasks contain both spatial and temporal requirements, e.g., as Linear Temporal Logic (LTL) formulas, formal methods provide a verifiable framework for task planning. However, since the planning complexity grows exponentially with the number of agents and the length of the task formula, existing studies are mostly limited to small artificial cases. To address this issue, a new planning paradigm is proposed in this work for system-wide temporal task formulas that are released online and continually. It avoids two common bottlenecks in the traditional methods, i.e., (i) the direct translation of the complete task formula to the associated Büchi automaton; and (ii) the synchronized product between the Büchi automaton and the transition models of all agents. Instead, an adaptive planning algorithm is proposed that computes the product of relaxed partially-ordered sets (R-posets) on-the-fly, and assigns these subtasks to the agents subject to the ordering constraints. It is shown that the first valid plan can be derived with a polynomial time and memory complexity w.r.t. the system size and the formula length. Our method can take into account task formulas with a length of more than 400 and a fleet with more than $400$ agents, while most existing methods fail at the formula length of 25 within a reasonable duration. The proposed method is validated on large fleets of service robots in both simulation and hardware experiments. △ Less

Submitted 9 April, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 16 pages, 9 figures

arXiv:2308.04830 [pdf, other]

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Authors: Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

Abstract: Current talking face generation methods mainly focus on speech-lip synchronization. However, insufficient investigation on the facial talking style leads to a lifeless and monotonous avatar. Most previous works fail to imitate expressive styles from arbitrary video prompts and ensure the authenticity of the generated video. This paper proposes an unsupervised variational style transfer model (VAST… ▽ More Current talking face generation methods mainly focus on speech-lip synchronization. However, insufficient investigation on the facial talking style leads to a lifeless and monotonous avatar. Most previous works fail to imitate expressive styles from arbitrary video prompts and ensure the authenticity of the generated video. This paper proposes an unsupervised variational style transfer model (VAST) to vivify the neutral photo-realistic avatars. Our model consists of three key components: a style encoder that extracts facial style representations from the given video prompts; a hybrid facial expression decoder to model accurate speech-related movements; a variational style enhancer that enhances the style space to be highly expressive and meaningful. With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner. Experimental results demonstrate the proposed approach contributes to a more vivid talking avatar with higher authenticity and richer expressiveness. △ Less

Submitted 11 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023 Workshop

arXiv:2307.08243 [pdf, other]

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

Authors: Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong

Abstract: Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space… ▽ More Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view. To fulfill this goal, we propose an uncertainty-aware state space Transformer (USST) that takes the merits of the attention mechanism and aleatoric uncertainty within the framework of the classical state-space model. The model can be further enhanced by the velocity constraint and visual prompt tuning (VPT) on large vision transformers. Moreover, we develop an annotation workflow to collect 3D hand trajectories with high quality. Experimental results on H2O and EgoPAT3D datasets demonstrate the superiority of USST for both 2D and 3D trajectory forecasting. The code and datasets are publicly released: https://actionlab-cv.github.io/EgoHandTrajPred. △ Less

Submitted 16 September, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: ICCV 2023 Accepted (Camera Ready)

Showing 1–50 of 466 results for author: Ba, W