-
Super-Eddington Magnetized Neutron Star Accretion Flows: a Self-similar Analysis
Authors:
Ken Chen,
Zi-Gao Dai
Abstract:
The properties of super-Eddington accretion disks exhibit substantial distinctions from the sub- Eddington ones. In this paper, we investigate the accretion process of a magnetized neutron star (NS) surrounded by a super-Eddington disk. By constructing self-similar solutions for the disk structure, we study in detail an interaction between the NS magnetosphere and the inner region of the disk, rev…
▽ More
The properties of super-Eddington accretion disks exhibit substantial distinctions from the sub- Eddington ones. In this paper, we investigate the accretion process of a magnetized neutron star (NS) surrounded by a super-Eddington disk. By constructing self-similar solutions for the disk structure, we study in detail an interaction between the NS magnetosphere and the inner region of the disk, revealing that this interaction takes place within a thin boundary layer. The magnetosphere truncation radius is found to be approximately proportional to the Alfvén radius, with a coefficient ranging between 0.34-0.71, influenced by the advection and twisting of a magnetic field, NS rotation, and radiation emitted from an NS accretion column. Under super-Eddington accretion, the NS can readily spin up to become a rapid rotator. The proposed model can be employed to explore the accretion and evolution of NSs in diverse astrophysical contexts, such as ultraluminous X-ray binaries or active galactic nucleus disks.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Authors:
Daniel Lersch,
Malachi Schram,
Zhenyu Dai,
Kishansingh Rajput,
Xingfu Wu,
N. Sato,
J. Taylor Childers
Abstract:
Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in…
▽ More
Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in order to be resource efficient while showing a reasonable convergence. In this paper we introduces a \underline{S}calable \underline{A}synchronous \underline{G}enerative workflow for solving \underline{I}nverse \underline{P}roblems \underline{S}olver (SAGIPS) on high-performance computing systems. We present a workflow that utilizes a parallelization approach where the gradients of the generator network are updated in an asynchronous ring-all-reduce fashion. Experiments with a scientific proxy application demonstrate that SAGIPS shows near linear weak scaling, together with a convergence quality that is comparable to traditional methods. The approach presented here allows leveraging GANs across multiple GPUs, promising advancements in solving complex inverse problems at scale.
△ Less
Submitted 11 June, 2024;
originally announced July 2024.
-
Forecast of cosmological constraints with superluminous supernovae from the Chinese Space Station Telescope
Authors:
Xuan-Dong Jia,
Jian-** Hu,
Fa-Yin Wang,
Zi-Gao Dai
Abstract:
Superluminous supernovae (SLSNe) are a class of intense celestial events that can be standardized for measuring cosmological parameters, bridging the gap between type Ia supernovae and the cosmic microwave background. In this work, we discuss the cosmological applications of SLSNe from the Chinese Space Station Telescope (CSST). Our estimation suggests that SLSNe rate is biased tracing the cosmic…
▽ More
Superluminous supernovae (SLSNe) are a class of intense celestial events that can be standardized for measuring cosmological parameters, bridging the gap between type Ia supernovae and the cosmic microwave background. In this work, we discuss the cosmological applications of SLSNe from the Chinese Space Station Telescope (CSST). Our estimation suggests that SLSNe rate is biased tracing the cosmic star formation rate, exhibiting a factor of $(1+z)^{1.2}$. We futher predict that CSST is poised to observe $\sim 360$ SLSNe in the 10 square degrees ultra-deep field survey within a span of 2.5 years. A stringent constraint on cosmological parameters can be derived from their peak-color relationship. CSST is anticipated to uncover a substantial number of SLSNe, contributing to a deeper understanding of their central engines and shedding light on the nature of dark energy at high redshifts.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
GlobalTomo: A global dataset for physics-ML seismic wavefield modeling and FWI
Authors:
Shiqian Li,
Zhi Li,
Zhancun Mu,
Shiji Xin,
Zhixiang Dai,
Kuangdai Leng,
Ruihua Zhang,
Xiaodong Song,
Yixin Zhu
Abstract:
Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Ma…
▽ More
Global seismic tomography, taking advantage of seismic waves from natural earthquakes, provides essential insights into the earth's internal dynamics. Advanced Full-waveform Inversion (FWI) techniques, whose aim is to meticulously interpret every detail in seismograms, confront formidable computational demands in forward modeling and adjoint simulations on a global scale. Recent advancements in Machine Learning (ML) offer a transformative potential for accelerating the computational efficiency of FWI and extending its applicability to larger scales. This work presents the first 3D global synthetic dataset tailored for seismic wavefield modeling and full-waveform tomography, referred to as the GlobalTomo dataset. This dataset is uniquely comprehensive, incorporating explicit wave physics and robust geophysical parameterization at realistic global scales, generated through state-of-the-art forward simulations optimized for 3D global wavefield calculations. Through extensive analysis and the establishment of ML baselines, we illustrate that ML approaches are particularly suitable for global FWI, overcoming its limitations with rapid forward modeling and flexible inversion strategies. This work represents a cross-disciplinary effort to enhance our understanding of the earth's interior through physics-ML modeling.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
Authors:
Ji Yan,
Jiwei Li,
X. T. He,
Lifeng Wang,
Yaohua Chen,
Feng Wang,
Xiaoying Han,
Kaiqiang Pan,
Juxi Liang,
Yulong Li,
Zanyang Guan,
Xiangming Liu,
Xingsen Che,
Zhong**g Chen,
Xing Zhang,
Yan Xu,
Bin Li,
Minging He,
Hongbo Cai,
Liang. Hao,
Zhanjun Liu,
Chunyang Zheng,
Zhensheng Dai,
Zhengfeng Fan,
Bin Qiao
, et al. (4 additional authors not shown)
Abstract:
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
A response to commenter Ke Lan's comment on our paper published in Nature Communications (2023)14:5782 by J. Yan et al
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
CuDA2: An approach for Incorporating Traitor Agents into Cooperative Multi-Agent Systems
Authors:
Zhen Chen,
Yong Liao,
Youpeng Zhao,
Zipeng Dai,
Jian Zhao
Abstract:
Cooperative Multi-Agent Reinforcement Learning (CMARL) strategies are well known to be vulnerable to adversarial perturbations. Previous works on adversarial attacks have primarily focused on white-box attacks that directly perturb the states or actions of victim agents, often in scenarios with a limited number of attacks. However, gaining complete access to victim agents in real-world environment…
▽ More
Cooperative Multi-Agent Reinforcement Learning (CMARL) strategies are well known to be vulnerable to adversarial perturbations. Previous works on adversarial attacks have primarily focused on white-box attacks that directly perturb the states or actions of victim agents, often in scenarios with a limited number of attacks. However, gaining complete access to victim agents in real-world environments is exceedingly difficult. To create more realistic adversarial attacks, we introduce a novel method that involves injecting traitor agents into the CMARL system. We model this problem as a Traitor Markov Decision Process (TMDP), where traitors cannot directly attack the victim agents but can influence their formation or positioning through collisions. In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward. Despite this, the training efficiency for traitors remains low because it is challenging for them to directly associate their actions with the victim agents' rewards. To address this issue, we propose the Curiosity-Driven Adversarial Attack (CuDA2) framework. CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies while maintaining the optimal policy invariance of the traitors. Specifically, we employ a pre-trained Random Network Distillation (RND) module, where the extra reward generated by the RND module encourages traitors to explore states unencountered by the victim agents. Extensive experiments on various scenarios from SMAC demonstrate that our CuDA2 framework offers comparable or superior adversarial attack capabilities compared to other baselines.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning
Authors:
Zhenlong Dai,
Chang Yao,
WenKang Han,
Ying Yuan,
Zhipeng Gao,
**gyuan Chen
Abstract:
Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been investigated. To bridge this gap, we proposed MPCoder (Multi-user Personalized Code Generator) to generate personalized code for multiple users. To better learn co…
▽ More
Large Language Models (LLMs) have demonstrated great potential for assisting developers in their daily development. However, most research focuses on generating correct code, how to use LLMs to generate personalized code has seldom been investigated. To bridge this gap, we proposed MPCoder (Multi-user Personalized Code Generator) to generate personalized code for multiple users. To better learn coding style features, we utilize explicit coding style residual learning to capture the syntax code style standards and implicit style learning to capture the semantic code style conventions. We train a multi-user style adapter to better differentiate the implicit feature representations of different users through contrastive learning, ultimately enabling personalized code generation for multiple users. We further propose a novel evaluation metric for estimating similarities between codes of different coding styles. The experimental results show the effectiveness of our approach for this novel task.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Data-Centric AI in the Age of Large Language Models
Authors:
Xinyi Xu,
Zhaoxuan Wu,
Rui Qiao,
Arun Verma,
Yao Shu,
**gtan Wang,
Xinyuan Niu,
Zhenfeng He,
Jiangwei Chen,
Zijian Zhou,
Gregory Kang Ruey Lau,
Hieu Dao,
Lucas Agussurja,
Rachael Hwee Ling Sim,
Xiaoqiang Lin,
Wenyang Hu,
Zhongxiang Dai,
Pang Wei Koh,
Bryan Kian Hsiang Low
Abstract:
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific…
▽ More
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Authors:
**hyuk Lee,
Anthony Chen,
Zhuyun Dai,
Dheeru Dua,
Devendra Singh Sachan,
Michael Boratko,
Yi Luan,
Sébastien M. R. Arnold,
Vincent Perot,
Siddharth Dalmia,
Hexiang Hu,
Xudong Lin,
Panupong Pasupat,
Aida Amini,
Jeremy R. Cole,
Sebastian Riedel,
Iftekhar Naim,
Ming-Wei Chang,
Kelvin Guu
Abstract:
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-…
▽ More
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
MusicScore: A Dataset for Music Score Modeling and Generation
Authors:
Yuheng Lin,
Zheqi Dai,
Qiuqiang Kong
Abstract:
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are ma…
▽ More
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music scores contains more semantic information than audio and symbolic representations of music. Previous music score datasets have limited sizes and are mainly designed for optical music recognition (OMR). There is a lack of research on creating a large-scale benchmark dataset for music modeling and generation. In this work, we propose MusicScore, a large-scale music score dataset collected and processed from the International Music Score Library Project (IMSLP). MusicScore consists of image-text pairs, where the image is a page of a music score and the text is the metadata of the music. The metadata of MusicScore is extracted from the general information section of the IMSLP pages. The metadata includes rich information about the composer, instrument, piece style, and genre of the music pieces. MusicScore is curated into small, medium, and large scales of 400, 14k, and 200k image-text pairs with varying diversity, respectively. We build a score generation system based on a UNet diffusion model to generate visually readable music scores conditioned on text descriptions to benchmark the MusicScore dataset for music score generation. MusicScore is released to the public at https://huggingface.co/datasets/ZheqiDAI/MusicScore.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Ablation Based Counterfactuals
Authors:
Zheng Dai,
David K Gifford
Abstract:
Diffusion models are a class of generative models that generate high-quality samples, but at present it is difficult to characterize how they depend upon their training data. This difficulty raises scientific and regulatory questions, and is a consequence of the complexity of diffusion models and their sampling process. To analyze this dependence, we introduce Ablation Based Counterfactuals (ABC),…
▽ More
Diffusion models are a class of generative models that generate high-quality samples, but at present it is difficult to characterize how they depend upon their training data. This difficulty raises scientific and regulatory questions, and is a consequence of the complexity of diffusion models and their sampling process. To analyze this dependence, we introduce Ablation Based Counterfactuals (ABC), a method of performing counterfactual analysis that relies on model ablation rather than model retraining. In our approach, we train independent components of a model on different but overlap** splits of a training set. These components are then combined into a single model, from which the causal influence of any training sample can be removed by ablating a combination of model components. We demonstrate how we can construct a model like this using an ensemble of diffusion models. We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes, and show that single source attributability diminishes with increasing training data size. Finally, we demonstrate the existence of unattributable samples.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Whitney Stratification of Algebraic Boundaries of Convex Semi-algebraic Sets
Authors:
Zihao Dai,
Zijia Li,
Zhi-Hong Yang,
Lihong Zhi
Abstract:
Algebraic boundaries of convex semi-algebraic sets are closely related to polynomial optimization problems. Building upon Rainer Sinn's work, we refine the stratification of iterated singular loci to a Whitney (a) stratification, which gives a list of candidates of varieties whose dual is an irreducible component of the algebraic boundary of the dual convex body. We also present an algorithm based…
▽ More
Algebraic boundaries of convex semi-algebraic sets are closely related to polynomial optimization problems. Building upon Rainer Sinn's work, we refine the stratification of iterated singular loci to a Whitney (a) stratification, which gives a list of candidates of varieties whose dual is an irreducible component of the algebraic boundary of the dual convex body. We also present an algorithm based on Teissier's criterion to compute Whitney (a) stratifications, which employs conormal spaces and prime decomposition.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Performance Analysis of Hybrid Cellular and Cell-free MIMO Network
Authors:
Zhuoyin Dai,
**gran Xu,
Xiaoli Xu,
Ruoguang Li,
Yong Zeng
Abstract:
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio…
▽ More
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communication network and collaborate with the existing cellular base stations (BSs). To further explore the performance limits of hybrid cellular and cell-free networks, this paper develops a hybrid network model based on stochastic geometric toolkits, which reveals the coupling of the signal and interference from both the cellular and cell-free networks. Specifically, the conjugate beamforming is applied in hybrid cellular and cell-free networks, which enables user equipment (UE) to benefit from both cellular BSs and cell-free APs. The aggregate signal received from the hybrid network is approximated via moment matching, and coverage probability is characterized by deriving the Laplace transform of the interference. The analysis of signal strength and coverage probability is verified by extensive simulations.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments
Authors:
Kehan Long,
Yinzhuang Yi,
Zhirui Dai,
Sylvia Herbert,
Jorge Cortés,
Nikolay Atanasov
Abstract:
We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dyn…
▽ More
We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dynamic operational conditions. By leveraging recent advances in distributionally robust optimization, we develop a distributionally robust control barrier function (DR-CBF) constraint that directly processes range sensor data to impose safety constraints. Coupling this with a control Lyapunov function (CLF) for path tracking, we demonstrate that our CLF-DR-CBF control synthesis method achieves safe, efficient, and robust navigation in uncertain dynamic environments. We demonstrate the effectiveness of our approach in simulated and real autonomous robot navigation experiments, marking a substantial advancement in real-time safety guarantees for mobile robots.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Prompt Optimization with Human Feedback
Authors:
Xiaoqiang Lin,
Zhongxiang Dai,
Arun Verma,
See-Kiong Ng,
Patrick Jaillet,
Bryan Kian Hsiang Low
Abstract:
Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black…
▽ More
Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black-box LLM, attaining such a score is often infeasible and unreliable. Instead, it is usually significantly easier and more reliable to obtain preference feedback from a human user, i.e., showing the user the responses generated from a pair of prompts and asking the user which one is preferred. Therefore, in this paper, we study the problem of prompt optimization with human feedback (POHF), in which we aim to optimize the prompt for a black-box LLM using only human preference feedback. Drawing inspiration from dueling bandits, we design a theoretically principled strategy to select a pair of prompts to query for preference feedback in every iteration, and hence introduce our algorithm named automated POHF (APOHF). We apply our APOHF algorithm to various tasks, including optimizing user instructions, prompt optimization for text-to-image generative models, and response optimization with human feedback (i.e., further refining the response using a variant of our APOHF). The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances. Our code can be found at \url{https://github.com/xqlin98/APOHF}.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars
Authors:
Zhaoxuan Wu,
Xiaoqiang Lin,
Zhongxiang Dai,
Wenyang Hu,
Yao Shu,
See-Kiong Ng,
Patrick Jaillet,
Bryan Kian Hsiang Low
Abstract:
Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s…
▽ More
Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https://github.com/ZhaoxuanWu/EASE-Prompt-Optimization.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Decaf: Data Distribution Decompose Attack against Federated Learning
Authors:
Zhiyang Dai,
Chunyi Zhou,
Anmin Fu
Abstract:
In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed Decaf. This attack enables an honest-but-curious FL server to meticulously profile the propor…
▽ More
In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed Decaf. This attack enables an honest-but-curious FL server to meticulously profile the proportion of each class owned by the victim FL user, divulging sensitive information like local market item distribution and business competitiveness. The crux of Decaf lies in the profound observation that the magnitude of local model gradient changes closely mirrors the underlying data distribution, including the proportion of each class. Decaf addresses two crucial challenges: accurately identify the missing/null class(es) given by any victim user as a premise and then quantify the precise relationship between gradient changes and each remaining non-null class. Notably, Decaf operates stealthily, rendering it entirely passive and undetectable to victim users regarding the infringement of their data distribution privacy. Experimental validation on five benchmark datasets (MNIST, FASHION-MNIST, CIFAR-10, FER-2013, and SkinCancer) employing diverse model architectures, including customized convolutional networks, standardized VGG16, and ResNet18, demonstrates Decaf's efficacy. Results indicate its ability to accurately decompose local user data distribution, regardless of whether it is IID or non-IID distributed. Specifically, the dissimilarity measured using $L_{\infty}$ distance between the distribution decomposed by Decaf and ground truth is consistently below 5\% when no null classes exist. Moreover, Decaf achieves 100\% accuracy in determining any victim user's null classes, validated through formal proof.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Multiwavelength Radiation from the Interaction between Magnetar Bursts and Companion Star in a Binary System
Authors:
Yu-Jia Wei,
Yuan-Pei Yang,
Da-Ming Wei,
Zi-Gao Dai
Abstract:
Magnetars are young, highly magnetized neutron stars that are associated with magnetar short bursts (MSBs), magnetar giant flares (MGFs), and at least a part of fast radio bursts (FRBs). In this work, we consider that a magnetar and a main sequence star are in a binary system and analyze the properties of the electromagnetic signals generated by the interaction between the magnetar bursts and the…
▽ More
Magnetars are young, highly magnetized neutron stars that are associated with magnetar short bursts (MSBs), magnetar giant flares (MGFs), and at least a part of fast radio bursts (FRBs). In this work, we consider that a magnetar and a main sequence star are in a binary system and analyze the properties of the electromagnetic signals generated by the interaction between the magnetar bursts and the companion star. During the pre-burst period, the persistent radiation could be generated by the interaction between the $e^+e^-$-pair wind from the magnetar and the companion or its stellar wind. We find that for a newborn magnetar, the pre-burst persistent radiation from the strong magnetar wind can be dominant, and it is mainly at the optical and ultraviolet (UV) bands. For relatively older magnetars, the reemission from a burst interacting with the companion is larger than the pre-burst persistent radiation and the luminosity of the companion itself. The transient reemission produced by the heating process has a duration of $0.1 - 10^5 {\rm~s}$ at the optical, UV, and X-ray bands. Additionally, we find that if these phenomena occur in nearby galaxies within a few hundred kiloparsecs, they could be detected by current or future optical telescopes.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
A Novel Model for the MeV Emission Line in GRB 221009A
Authors:
Yu-Jia Wei,
Jia Ren,
Hao-Ning He,
Yuan-Pei Yang,
Da-Ming Wei,
Zi-Gao Dai,
B. Theodore Zhang
Abstract:
Gamma-ray bursts (GRBs) have long been considered potential sources of ultra-high-energy cosmic rays (UHECRs; with energy $\gtrsim 10^{18} {\rm~eV}$). In this work, we propose a novel model generating MeV emission lines in GRB, which can constrain the properties of heavy nuclei that potentially exist in GRB jets. Specifically, we find that relativistic hydrogen-like high-atomic-number ions origina…
▽ More
Gamma-ray bursts (GRBs) have long been considered potential sources of ultra-high-energy cosmic rays (UHECRs; with energy $\gtrsim 10^{18} {\rm~eV}$). In this work, we propose a novel model generating MeV emission lines in GRB, which can constrain the properties of heavy nuclei that potentially exist in GRB jets. Specifically, we find that relativistic hydrogen-like high-atomic-number ions originating from the $β$ decay of unstable nuclei and/or the recombination entrained in the GRB jet can generate narrow MeV emission lines through the de-excitation of excited-electrons. This model can successfully explain the MeV emission line observed in the most luminous GRB ever recorded, GRB~221009A, with suitable parameters including a Lorentz factor $γ\sim 820-1700$ and a total mass of heavy nuclei $M_{\rm tot} \sim 10^{23} - 10^{26}$~g. Especially, the emission line broadening can be reasonably attributed to both the expansion of the jet shell and the thermal motion of nuclei, naturally resulting in a narrow width ($σ_{\rm line} / E_{\rm line} \lesssim 0.2$) consistent with the observation. Furthermore, we predict that different GRBs can exhibit lines in different bands with various evolving behaviors, which might be confirmed with further observations. Finally, our model provides indirect evidence that GRBs may be one of the sources of UHECRs.
△ Less
Submitted 8 June, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
On the superconducting gap structure of the miassite Rh17S15: Nodal or nodeless?
Authors:
J. Y. Nie,
C. C. Zhao,
C. Q. Xu,
B. Li,
C. P. Tu,
X. Zhang,
D. Z. Dai,
H. R. Wang,
S. Xu,
Wenhe Jiao,
B. M. Wang,
Zhu'an Xu,
Xiaofeng Xu,
S. Y. Li
Abstract:
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down…
▽ More
Recent penetration depth measurement claimed the observation of unconventional superconductivity in the miassite Rh$_{17}$S$_{15}$ single crystals, evidenced by the linear-in-temperature penetration depth at low temperatures, thereby arguing for the presence of the lines of node in its superconducting gap structure. Here we measure the thermal conductivity of Rh$_{17}$S$_{15}$ single crystals down to 110 mK and up to a field of 8 T ($\simeq 0.4H{\rm_{c2}}$). In marked contrast to the penetration depth measurement, we observe a negligible residual linear term $κ_0/T$ in zero field, in line with the nodeless gap structure. The field dependence of $κ_0(H)/T$ shows a profile that is more consistent with either a highly anisotropic gap structure or multiple nodeless gaps with significantly different magnitudes. Moreover, first-principles calculations give two electronic bands with complex shape of Fermi surfaces. These results suggest multigap nodeless superconductivity in this multiband Rh$_{17}$S$_{15}$ superconductor.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Wall-Climbing Performance of Gecko-inspired Robot with Soft Feet and Digits enhanced by Gravity Compensation
Authors:
Bingcheng Wang,
Zhiyuan Weng,
Haoyu Wang,
Shuangjie Wang,
Zhouyi Wang,
Zhendong Dai,
Ardian Jusufi
Abstract:
Gravitational forces can induce deviations in body posture from desired configurations in multi-legged arboreal robot locomotion with low leg stiffness, affecting the contact angle between the swing leg's end-effector and the climbing surface during the gait cycle. The relationship between desired and actual foot positions is investigated here in a leg-stiffness-enhanced model under external force…
▽ More
Gravitational forces can induce deviations in body posture from desired configurations in multi-legged arboreal robot locomotion with low leg stiffness, affecting the contact angle between the swing leg's end-effector and the climbing surface during the gait cycle. The relationship between desired and actual foot positions is investigated here in a leg-stiffness-enhanced model under external forces, focusing on the challenge of unreliable end-effector attachment on climbing surfaces in such robots. Inspired by the difference in ceiling attachment postures of dead and living geckos, feedforward compensation of the stance phase legs is the key to solving this problem. A feedforward gravity compensation (FGC) strategy, complemented by leg coordination, is proposed to correct gravity-influenced body posture and improve adhesion stability by reducing body inclination. The efficacy of this strategy is validated using a quadrupedal climbing robot, EF-I, as the experimental platform. Experimental validation on an inverted surface (ceiling walking) highlight the benefits of the FGC strategy, demonstrating its role in enhancing stability and ensuring reliable end-effector attachment without external assistance. In the experiment, robots without FGC only completed in 3 out of 10 trials, while robots with FGC achieved a 100\% success rate in the same trials. The speed was substantially greater with FGC, achieved 9.2 mm/s in the trot gait. This underscores the proposed potential of FGC strategy in overcoming the challenges associated with inconsistent end-effector attachment in robots with low leg stiffness, thereby facilitating stable locomotion even at inverted body attitude.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a
Authors:
Y. Liu,
H. Sun,
D. Xu,
D. S. Svinkin,
J. Delaunay,
N. R. Tanvir,
H. Gao,
C. Zhang,
Y. Chen,
X. -F. Wu,
B. Zhang,
W. Yuan,
J. An,
G. Bruni,
D. D. Frederiks,
G. Ghirlanda,
J. -W. Hu,
A. Li,
C. -K. Li,
J. -D. Li,
D. B. Malesani,
L. Piro,
G. Raman,
R. Ricci,
E. Troja
, et al. (170 additional authors not shown)
Abstract:
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,…
▽ More
Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Scattering Cross Sections of Magnetized Particles within Intense Electromagnetic Waves: Application to Fast Radio Bursts
Authors:
Yu-Chen Huang,
Shu-Qing Zhong,
Zi-Gao Dai
Abstract:
Recently, Beloborodov suggested that there exists a resonance phenomenon between an extremely intense electromagnetic wave and internal magnetized particles. The particles exchange energy with the wave at frequent resonance events and then reach the radiation reaction limit immediately. This process greatly enhances the scattering cross section of the particles. Note that these results only involv…
▽ More
Recently, Beloborodov suggested that there exists a resonance phenomenon between an extremely intense electromagnetic wave and internal magnetized particles. The particles exchange energy with the wave at frequent resonance events and then reach the radiation reaction limit immediately. This process greatly enhances the scattering cross section of the particles. Note that these results only involve an extraordinary (X) mode wave. In this paper, we focus on an intense ordinary (O) mode wave propagating through magnetized particles and compare it with the case of the X-mode wave. Our result shows that the scattering cross section of the particles in the O-mode wave is significantly smaller than that in the X-mode wave. This has important implications for the transparency of a fast radio burst (FRB) inside the magnetosphere of a magnetar. We argue that there is a strong scattering region in the stellar magnetosphere, within which an O-mode wave is more transparent than an X-mode wave for an FRB.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Ultrafast Vibrational Control of Hybrid Perovskite Devices Reveals the Influence of the Organic Cation on Electronic Dynamics
Authors:
Nathaniel. P. Gallop,
Dmitry R. Maslennikov,
Katelyn P. Goetz,
Zhenbang Dai,
Aaron M. Schankler,
Woongmo Sung,
Satoshi Nihonyanagi,
Tahei Tahara,
Maryna Bodnarchuk,
Maksym Kovalenko,
Yana Vaynzof,
Andrew M. Rappe,
Artem A. Bakulin
Abstract:
Vibrational control (VC) of photochemistry through the optical stimulation of structural dynamics is a nascent concept only recently demonstrated for model molecules in solution. Extending VC to state-of-the-art materials may lead to new applications and improved performance for optoelectronic devices. Metal halide perovskites are promising targets for VC due to their mechanical softness and the r…
▽ More
Vibrational control (VC) of photochemistry through the optical stimulation of structural dynamics is a nascent concept only recently demonstrated for model molecules in solution. Extending VC to state-of-the-art materials may lead to new applications and improved performance for optoelectronic devices. Metal halide perovskites are promising targets for VC due to their mechanical softness and the rich array of vibrational motions of both their inorganic and organic sublattices. Here, we demonstrate the ultrafast VC of FAPbBr3 perovskite solar cells via intramolecular vibrations of the formamidinium cation using spectroscopic techniques based on vibrationally promoted electronic resonance. The observed short (~300 fs) time window of VC highlights the fast dynamics of coupling between the cation and inorganic sublattice. First-principles modelling reveals that this coupling is mediated by hydrogen bonds that modulate both lead halide lattice and electronic states. Cation dynamics modulating this coupling may suppress non-radiative recombination in perovskites, leading to photovoltaics with reduced voltage losses.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Privacy-Preserving Intrusion Detection using Convolutional Neural Networks
Authors:
Martin Kodys,
Zhongmin Dai,
Vrizlynn L. L. Thing
Abstract:
Privacy-preserving analytics is designed to protect valuable assets. A common service provision involves the input data from the client and the model on the analyst's side. The importance of the privacy preservation is fuelled by legal obligations and intellectual property concerns. We explore the use case of a model owner providing an analytic service on customer's private data. No information ab…
▽ More
Privacy-preserving analytics is designed to protect valuable assets. A common service provision involves the input data from the client and the model on the analyst's side. The importance of the privacy preservation is fuelled by legal obligations and intellectual property concerns. We explore the use case of a model owner providing an analytic service on customer's private data. No information about the data shall be revealed to the analyst and no information about the model shall be leaked to the customer. Current methods involve costs: accuracy deterioration and computational complexity. The complexity, in turn, results in a longer processing time, increased requirement on computing resources, and involves data communication between the client and the server. In order to deploy such service architecture, we need to evaluate the optimal setting that fits the constraints. And that is what this paper addresses. In this work, we enhance an attack detection system based on Convolutional Neural Networks with privacy-preserving technology based on PriMIA framework that is initially designed for medical data.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
On the dynamical evolution of the asteroid belt in a massive star-neutron star binary
Authors:
Chen Deng,
Yong-Feng Huang,
Chen Du,
Pei Wang,
Zi-Gao Dai
Abstract:
Some fast radio bursts (FRBs) exhibit repetitive behaviors and their origins remain enigmatic. It has been argued that repeating FRBs could be produced by the interaction between a neutron star and an asteroid belt. Here we consider the systems in which an asteroid belt dwells around a massive star, while a neutron star, as a companion of the massive star, interacts with the belt through gravitati…
▽ More
Some fast radio bursts (FRBs) exhibit repetitive behaviors and their origins remain enigmatic. It has been argued that repeating FRBs could be produced by the interaction between a neutron star and an asteroid belt. Here we consider the systems in which an asteroid belt dwells around a massive star, while a neutron star, as a companion of the massive star, interacts with the belt through gravitational force. Various orbital configurations are assumed for the system. Direct N-body simulations are performed to investigate the dynamical evolution of the asteroids belt. It is found that a larger orbital eccentricity of the neutron star will destroy the belt more quickly, with a large number of asteroids being scattered out of the system. A non-zero mutual inclination angle suppresses the ejection rate of asteroids, but it also leads to a reduction in the collision rate of asteroids with the neutron star since many asteroids are essentially scattered into the 3D space. Among the various configurations, a clear periodicity is observed in the collision events for the case with an orbital eccentricity of 0.7 and mutual inclination of $0^{\circ}$. It is found that such a periodicity can be sustained for at least 8 neutron star orbital periods, supporting this mechanism as a possible explanation for periodically repeating FRBs. Our studies also suggest that the active stage of these kinds of FRB sources should be limited, since the asteroid belt would finally be destroyed by the neutron star after multiple passages.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Authors:
**hyuk Lee,
Zhuyun Dai,
Xiaoqi Ren,
Blair Chen,
Daniel Cer,
Jeremy R. Cole,
Kai Hui,
Michael Boratko,
Rajvi Kapadia,
Wen Ding,
Yi Luan,
Sai Meher Karthik Duddu,
Gustavo Hernandez Abrego,
Weiqiang Shi,
Nithi Gupta,
Aditya Kusupati,
Prateek Jain,
Siddhartha Reddy Jonnalagadda,
Ming-Wei Chang,
Iftekhar Naim
Abstract:
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each…
▽ More
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Authors:
Shenhao Zhu,
Junming Leo Chen,
Zuozhuo Dai,
Qingkun Su,
Yinghui Xu,
Xun Cao,
Yao Yao,
Hao Zhu,
Siyu Zhu
Abstract:
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques. The methodology utilizes the SMPL(Skinned Multi-Person Linear) model as the 3D human parametric model to establish a unified representation of body shape and pose. Thi…
▽ More
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques. The methodology utilizes the SMPL(Skinned Multi-Person Linear) model as the 3D human parametric model to establish a unified representation of body shape and pose. This facilitates the accurate capture of intricate human geometry and motion characteristics from source videos. Specifically, we incorporate rendered depth images, normal maps, and semantic maps obtained from SMPL sequences, alongside skeleton-based motion guidance, to enrich the conditions to the latent diffusion model with comprehensive 3D shape and detailed pose attributes. A multi-layer motion fusion module, integrating self-attention mechanisms, is employed to fuse the shape and motion latent representations in the spatial domain. By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion. Experimental evaluations conducted on benchmark datasets demonstrate the methodology's superior ability to generate high-quality human animations that accurately capture both pose and shape variations. Furthermore, our approach also exhibits superior generalization capabilities on the proposed in-the-wild dataset. Project page: https://fudan-generative-vision.github.io/champ.
△ Less
Submitted 1 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
Authors:
Zhenghao Zhang,
Zuozhuo Dai,
Long Qin,
Weizhi Wang
Abstract:
Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets. Current video editing methods commonly require per-video fine-tuning of diffusion models or specific inversion optimization to ensure high-fidelity edits. In this paper, we introduce EffiVED, an efficient diffusion-based model that d…
▽ More
Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets. Current video editing methods commonly require per-video fine-tuning of diffusion models or specific inversion optimization to ensure high-fidelity edits. In this paper, we introduce EffiVED, an efficient diffusion-based model that directly supports instruction-guided video editing. To achieve this, we present two efficient workflows to gather video editing pairs, utilizing augmentation and fundamental vision-language techniques. These workflows transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED. Experimental results reveal that EffiVED not only generates high-quality editing videos but also executes rapidly. Finally, we demonstrate that our data collection method significantly improves editing performance and can potentially tackle the scarcity of video editing data. Code can be found at https://github.com/alibaba/EffiVED.
△ Less
Submitted 5 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Prototy** and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map
Authors:
Zhuoyin Dai,
Di Wu,
Zhenjun Dong,
Kun Li,
Dingyang Ding,
Sihan Wang,
Yong Zeng
Abstract:
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te…
▽ More
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, termed beam index map (BIM). To this end, a general CKM construction method is first presented, and an indoor BIM is constructed offline to learn the candidate transmit and receive beam index pairs for each grid in the experimental area. Furthermore, based on the location information of the receiver (or the dynamic obstacles) from the ultra-wide band (UWB) positioning system, the established BIM is used to achieve training-free beam alignment by directly providing the beam indexes for the transmitter and receiver. Three typical scenarios are considered in the experiment, including quasi-static environment with line-of-sight (LoS) link, quasistatic environment without LoS link and dynamic environment. Besides, the receiver orientation measured from the gyroscope is also used to help CKM predict more accurate beam indexes. The experiment results show that compared with the benchmark location-based beam alignment strategy, the CKM-based beam alignment strategy can achieve much higher received power, which is close to that achieved by exhaustive beam search, but with significantly reduced training overhead.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Robustifying and Boosting Training-Free Neural Architecture Search
Authors:
Zhenfeng He,
Yao Shu,
Zhongxiang Dai,
Bryan Kian Hsiang Low
Abstract:
Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics ty…
▽ More
Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics typically varies across different tasks, making it challenging to achieve robust and consistently good search performance on diverse tasks with only a single training-free metric. Meanwhile, the estimation gap between training-free metrics and the true architecture performances limits training-free NAS to achieve superior performance. To address these challenges, we propose the robustifying and boosting training-free NAS (RoBoT) algorithm which (a) employs the optimized combination of existing training-free metrics explored from Bayesian optimization to develop a robust and consistently better-performing metric on diverse tasks, and (b) applies greedy search, i.e., the exploitation, on the newly developed metric to bridge the aforementioned gap and consequently to boost the search performance of standard training-free NAS further. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS under mild conditions with additional interesting insights. Our extensive experiments on various NAS benchmark tasks yield substantial empirical evidence to support our theoretical results.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
In-context Exploration-Exploitation for Reinforcement Learning
Authors:
Zhenwen Dai,
Federico Tomasi,
Sina Ghiassian
Abstract:
In-context learning is a promising approach for online policy learning of offline reinforcement learning (RL) methods, which can be achieved at inference time without gradient optimization. However, this method is hindered by significant computational costs resulting from the gathering of large training trajectory sets and the need to train large Transformer models. We address this challenge by in…
▽ More
In-context learning is a promising approach for online policy learning of offline reinforcement learning (RL) methods, which can be achieved at inference time without gradient optimization. However, this method is hindered by significant computational costs resulting from the gathering of large training trajectory sets and the need to train large Transformer models. We address this challenge by introducing an In-context Exploration-Exploitation (ICEE) algorithm, designed to optimize the efficiency of in-context policy learning. Unlike existing models, ICEE performs an exploration-exploitation trade-off at inference time within a Transformer model, without the need for explicit Bayesian inference. Consequently, ICEE can solve Bayesian optimization problems as efficiently as Gaussian process biased methods do, but in significantly less time. Through experiments in grid world environments, we demonstrate that ICEE can learn to solve new RL tasks using only tens of episodes, marking a substantial improvement over the hundreds of episodes needed by the previous in-context learning method.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Yi: Open Foundation Models by 01.AI
Authors:
01. AI,
:,
Alex Young,
Bei Chen,
Chao Li,
Chengen Huang,
Ge Zhang,
Guanwei Zhang,
Heng Li,
Jiangcheng Zhu,
Jianqun Chen,
**g Chang,
Kaidong Yu,
Peng Liu,
Qiang Liu,
Shawn Yue,
Senbin Yang,
Shiming Yang,
Tao Yu,
Wen Xie,
Wenhao Huang,
Xiaohui Hu,
Xiaoyi Ren,
Xinyao Niu,
Pengcheng Nie
, et al. (7 additional authors not shown)
Abstract:
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,…
▽ More
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Localized Zeroth-Order Prompt Optimization
Authors:
Wenyang Hu,
Yao Shu,
Zongmin Yu,
Zhaoxuan Wu,
Xiangqiang Lin,
Zhongxiang Dai,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in develo** prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of fin…
▽ More
The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in develo** prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Fast Radio Bursts in the Disks of Active Galactic Nuclei
Authors:
Z. Y. Zhao,
K. Chen,
F. Y. Wang,
Z. G. Dai
Abstract:
Fast radio bursts (FRBs) are luminous millisecond-duration radio pulses with extragalactic origin, which were discovered more than a decade ago. Despite the numerous samples, the physical origin of FRBs remains poorly understood. FRBs have been thought to originate from young magnetars or accreting compact objects (COs). Massive stars or COs are predicted to be embedded in the accretion disks of a…
▽ More
Fast radio bursts (FRBs) are luminous millisecond-duration radio pulses with extragalactic origin, which were discovered more than a decade ago. Despite the numerous samples, the physical origin of FRBs remains poorly understood. FRBs have been thought to originate from young magnetars or accreting compact objects (COs). Massive stars or COs are predicted to be embedded in the accretion disks of active galactic nuclei (AGNs). The dense disk absorbs FRBs severely, making them difficult to observe. However, progenitors ejecta or outflow feedback from the accreting COs interact with the disk material to form a cavity. The existence of the cavity can reduce the absorption by the dense disk materials, making FRBs escape. Here we investigate the production and propagation of FRBs in AGN disks and find that the AGN environments lead to the following unique observational properties, which can be verified in future observation. First, the dense material in the disk can cause large dispersion measure (DM) and rotation measure (RM). Second, the toroidal magnetic field in the AGN disk can cause Faraday conversion. Third, during the shock breakout, DM and RM show non-power-law evolution patterns over time. Fourth, for accreting-powered models, higher accretion rates lead to more bright bursts in AGN disks, accounting for up to 1% of total bright repeating FRBs.
△ Less
Submitted 4 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
A Universal Roadmap For Searching Repulsive Casimir Forces Between Magneto-Electric Materials
Authors:
Zixuan Dai,
Qing-Dong Jiang
Abstract:
The Casimir effect, arising from vacuum quantum fluctuations, plays a fundamental role in the development of modern quantum electrodynamics. In parallel, the field of condensed matter has flourished through the discovery of various materials exhibiting broken symmetries, often connected to topology and characterized by magneto-electric coupling. Here, we calculate the Casimir forces between materi…
▽ More
The Casimir effect, arising from vacuum quantum fluctuations, plays a fundamental role in the development of modern quantum electrodynamics. In parallel, the field of condensed matter has flourished through the discovery of various materials exhibiting broken symmetries, often connected to topology and characterized by magneto-electric coupling. Here, we calculate the Casimir forces between materials with time-reversal symmetry and/or parity symmetry breaking. Remarkably, we obtain a universal phase diagram governing the sign of symmetry-breaking-induced Casimir forces, contributing to a comprehensive understanding on the sign of Casimir force for linear optical materials. The discovered phase diagram serves as a roadmap for searching repulsive Casimir forces, a subject bearing both theoretical interest and practical significance.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Acoustic tactile sensing for mobile robot wheels
Authors:
Wilfred Mason,
David Brenken,
Falcon Z. Dai,
Ricardo Gonzalo Cruz Castillo,
Olivier St-Martin Cormier,
Audrey Sedal
Abstract:
Tactile sensing in mobile robots remains under-explored, mainly due to challenges related to sensor integration and the complexities of distributed sensing. In this work, we present a tactile sensing architecture for mobile robots based on wheel-mounted acoustic waveguides. Our sensor architecture enables tactile sensing along the entire circumference of a wheel with a single active component: an…
▽ More
Tactile sensing in mobile robots remains under-explored, mainly due to challenges related to sensor integration and the complexities of distributed sensing. In this work, we present a tactile sensing architecture for mobile robots based on wheel-mounted acoustic waveguides. Our sensor architecture enables tactile sensing along the entire circumference of a wheel with a single active component: an off-the-shelf acoustic rangefinder. We present findings showing that our sensor, mounted on the wheel of a mobile robot, is capable of discriminating between different terrains, detecting and classifying obstacles with different geometries, and performing collision detection via contact localization. We also present a comparison between our sensor and sensors traditionally used in mobile robots, and point to the potential for sensor fusion approaches that leverage the unique capabilities of our tactile sensing architecture. Our findings demonstrate that autonomous mobile robots can further leverage our sensor architecture for diverse map** tasks requiring knowledge of terrain material, surface topology, and underlying structure.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Residual entropy from temperature incremental Monte Carlo method
Authors:
Zenan Dai,
Xiao Yan Xu
Abstract:
Residual entropy, indicative of the degrees of freedom in a system at absolute zero, is a cornerstone for understanding quantum and classical ground states. Despite its critical role in elucidating low-temperature phenomena and ground state degeneracy, accurately quantifying residual entropy remains a formidable challenge due to significant computational hurdles. In this Letter, we introduce the T…
▽ More
Residual entropy, indicative of the degrees of freedom in a system at absolute zero, is a cornerstone for understanding quantum and classical ground states. Despite its critical role in elucidating low-temperature phenomena and ground state degeneracy, accurately quantifying residual entropy remains a formidable challenge due to significant computational hurdles. In this Letter, we introduce the Temperature Incremental Monte Carlo (TIMC) method, our novel solution crafted to surmount these challenges. The TIMC method incrementally calculates the partition function ratio of neighboring temperatures within Monte Carlo simulations, enabling precise entropy calculations and providing insights into a spectrum of other temperature-dependent observables in a single computational sweep of temperatures. We have rigorously applied TIMC to a variety of complex systems, such as the frustrated antiferromagnetic Ising model on both C60 and 2D triangular lattices, the Newman-Moore spin glass model, and a 2D quantum transverse field Ising model. Notably, our method surmounts the traditional obstacles encountered in partition function measurements when map** $d$-dimensional quantum models to $d+1$-dimensional classical counterparts. The TIMC method's finesse in detailing entropy across the entire temperature range enriches our comprehension of critical phenomena in condensed matter physics. This includes insights into spin glasses, phases exhibiting spontaneous symmetry breaking, topological states of matter and fracton phases. Our approach not only advances the methodology for probing the entropic landscape of such systems but also paves the way for exploring their broader thermodynamic and quantum mechanical properties.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Reverse Shock Emission in an Off-axis Top-hat Jet Model for Gamma-Ray Bursts
Authors:
Sen-Lin Pang,
Zi-Gao Dai
Abstract:
The afterglow of a gamma-ray burst (GRB) has been widely argued to arise from the interaction of a relativistic outflow with its ambient medium. During such an interaction, a pair of shocks are generated: a forward shock that propagates into the medium, and a reverse shock that propagates into the outflow. Extensive studies have been conducted on the emission from the forward shock viewed off-axis…
▽ More
The afterglow of a gamma-ray burst (GRB) has been widely argued to arise from the interaction of a relativistic outflow with its ambient medium. During such an interaction, a pair of shocks are generated: a forward shock that propagates into the medium, and a reverse shock that propagates into the outflow. Extensive studies have been conducted on the emission from the forward shock viewed off-axis. Furthermore, the observation of a reverse shock in an on-axis short GRB suggests that the reverse shock can produce an electromagnetic counterpart to a gravitational wave-detected merger. In this paper, we investigate the contribution of the reverse shock to the afterglow from a top-hat jet viewed off-axis, and apply our model to some short GRBs previously modeled by an off-axis emission. We employ the Markov Chain Monte Carlo (MCMC) method to get the model parameters (i.e., the jet's half-opening angle $θ_j$, the viewing angle $θ_\text{obs}$, the initial Lorentz factor $Γ_0$, and the isotropic energy $E_\mathrm{iso}$). Our model successfully reproduces off-axis afterglow emission without a structured jet. In addition, our calculations suggest that the reverse shock may produce a prominent feature in an early afterglow, which can be potentially observed in an orphan optical afterglow.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Magnetar as the Central Engine of AT2018cow: Optical, Soft X-Ray, and Hard X-Ray Emission
Authors:
Long Li,
Shu-Qing Zhong,
Di Xiao,
Zi-Gao Dai,
Shi-Feng Huang,
Zhen-Feng Sheng
Abstract:
AT2018cow is the most extensively observed and widely studied fast blue optical transient to date; its unique observational properties challenge all existing standard models. In this paper, we model the luminosity evolution of the optical, soft X-ray, and hard X-ray emission, as well as the X-ray spectrum of AT2018cow with a magnetar-centered engine model. We consider a two-zone model with a strip…
▽ More
AT2018cow is the most extensively observed and widely studied fast blue optical transient to date; its unique observational properties challenge all existing standard models. In this paper, we model the luminosity evolution of the optical, soft X-ray, and hard X-ray emission, as well as the X-ray spectrum of AT2018cow with a magnetar-centered engine model. We consider a two-zone model with a striped magnetar wind in the interior and an expanding ejecta outside. The soft and hard X-ray emission of AT2018cow can be explained by the leakage of high-energy photons produced by internal gradual magnetic dissipation in the striped magnetar wind, while the luminous thermal UV/optical emission results from the thermalization of the ejecta by the captured photons. The two-component energy spectrum yielded by our model with a quasi-thermal component from the optically thick region of the wind superimposed on an optically thin synchrotron component well reproduces the X-ray spectral shape of AT2018cow. The Markov Chain Monte Carlo fitting results suggest that in order to explain the very short rise time to peak of the thermal optical emission, a low ejecta mass $M_{\rm ej}\approx0.1~M_\odot$ and high ejecta velocity $v_{\rm SN}\approx0.17c$ are required. A millisecond magnetar with $P_0\approx3.7~\rm ms$ and $B_p\approx2.4\times10^{14}~\rm G$ is needed to serve as the central engine of AT2018cow.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
The Very Early Soft X-ray Plateau of GRB 230307A: Signature of an Evolving Radiative Efficiency in Magnetar Wind Dissipation?
Authors:
Shu-Qing Zhong,
Long Li,
Di Xiao,
Hui Sun,
Bin-Bin Zhang,
Zi-Gao Dai
Abstract:
Very recently, a particularly long gamma-ray burst (GRB) 230307A was reported and proposed to originate from a compact binary merger based on its host galaxy property, kilonova, and heavy elements. More intriguingly, a very early plateau followed by a rapid decline in soft X-ray band was detected in its light curve by the Lobster Eye Imager for Astronomy, indicating strong evidence of the existenc…
▽ More
Very recently, a particularly long gamma-ray burst (GRB) 230307A was reported and proposed to originate from a compact binary merger based on its host galaxy property, kilonova, and heavy elements. More intriguingly, a very early plateau followed by a rapid decline in soft X-ray band was detected in its light curve by the Lobster Eye Imager for Astronomy, indicating strong evidence of the existence of a magnetar as the merger product. This work explores that the Magnetar Wind Internal Gradual MAgnetic Dissipation (MIGMAD) model, in which the radiative efficiency evolves over time, successfully fits it to the observed data. Our results reinforce the notion that the X-ray plateau serves as a powerful indicator of a magnetar and imply that an evolving efficiency is likely to be a common feature in X-ray plateaus of GRB afterglows. In addition, we also discuss the explanations for the prompt emission, GRB afterglows, as well as kilonova, and predict possible kilonova afterglows in a magnetar central engine.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Theory of excitonic polarons: From models to first-principles calculations
Authors:
Zhenbang Dai,
Chao Lian,
Jon Lafuente-Bartolome,
Feliciano Giustino
Abstract:
Excitons are neutral excitations that are composed of electrons and holes bound together by their attractive Coulomb interaction. The electron and the hole forming the exciton also interact with the underlying atomic lattice, and this interaction can lead to a trap** potential that favors exciton localization. The quasi-particle thus formed by the exciton and the surrounding lattice distortion i…
▽ More
Excitons are neutral excitations that are composed of electrons and holes bound together by their attractive Coulomb interaction. The electron and the hole forming the exciton also interact with the underlying atomic lattice, and this interaction can lead to a trap** potential that favors exciton localization. The quasi-particle thus formed by the exciton and the surrounding lattice distortion is called excitonic polaron. Excitonic polarons have long been thought to exist in a variety of materials, and are often invoked to explain the Stokes shift between the optical absorption edge and the photo-luminescence peak. However, quantitative ab initio calculations of these effects are exceedingly rare. In this manuscript, we present a theory of excitonic polarons that is amenable to first-principles calculations. We first apply this theory to model Hamiltonians for Wannier excitons experiencing Fröhlich or Holstein electron-phonon couplings. We find that, in the case of Fröhlich interactions, excitonic polarons only form when there is a significant difference between electron and hole effective masses. Then, we apply this theory to calculating excitonic polarons in lithium fluoride ab initio. The key advantage of the present approach is that it does not require supercells, therefore it can be used to study a variety of materials hosting either small or large excitonic polarons. This work constitutes the first step toward a complete ab initio many-body theory of excitonic polarons in real materials.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Excitonic polarons and self-trapped excitons from first-principles exciton-phonon couplings
Authors:
Zhenbang Dai,
Chao Lian,
Jon Lafuente-Bartolome,
Feliciano Giustino
Abstract:
Excitons consist of electrons and holes held together by their attractive Coulomb interaction. Although excitons are neutral excitations, spatial fluctuations in their charge density couple with the ions of the crystal lattice. This coupling can lower the exciton energy and lead to the formation of a localized excitonic polaron, or even a self-trapped exciton in the presence of strong exciton-phon…
▽ More
Excitons consist of electrons and holes held together by their attractive Coulomb interaction. Although excitons are neutral excitations, spatial fluctuations in their charge density couple with the ions of the crystal lattice. This coupling can lower the exciton energy and lead to the formation of a localized excitonic polaron, or even a self-trapped exciton in the presence of strong exciton-phonon interactions. Here, we develop a theoretical and computational approach to compute excitonic polarons and self-trapped excitons from first principles. Our methodology combines the many-body Bethe-Salpeter approach with density-functional perturbation theory, and does not require explicit supercell calculations. As a proof of concept, we demonstrate our method for a compound of the halide perovskite family.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning
Authors:
Hao Wang,
Bo Tang,
Chi Harold Liu,
Shangqin Mao,
Jiahong Zhou,
Zipeng Dai,
Yaqi Sun,
Qianlong Xie,
Xingxing Wang,
Dong Wang
Abstract:
Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single chan…
▽ More
Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
Authors:
Yuhang He,
Zhuangzhuang Dai,
Long Chen,
Niki Trigoni,
Andrew Markham
Abstract:
In this paper, we study an underexplored, yet important and challenging problem: counting the number of distinct sounds in raw audio characterized by a high degree of polyphonicity. We do so by systematically proposing a novel end-to-end trainable neural network (which we call DyDecNet, consisting of a dyadic decomposition front-end and backbone network), and quantifying the difficulty level of co…
▽ More
In this paper, we study an underexplored, yet important and challenging problem: counting the number of distinct sounds in raw audio characterized by a high degree of polyphonicity. We do so by systematically proposing a novel end-to-end trainable neural network (which we call DyDecNet, consisting of a dyadic decomposition front-end and backbone network), and quantifying the difficulty level of counting depending on sound polyphonicity. The dyadic decomposition front-end progressively decomposes the raw waveform dyadically along the frequency axis to obtain time-frequency representation in multi-stage, coarse-to-fine manner. Each intermediate waveform convolved by a parent filter is further processed by a pair of child filters that evenly split the parent filter's carried frequency response, with the higher-half child filter encoding the detail and lower-half child filter encoding the approximation. We further introduce an energy gain normalization to normalize sound loudness variance and spectrum overlap, and apply it to each intermediate parent waveform before feeding it to the two child filters. To better quantify sound counting difficulty level, we further design three polyphony-aware metrics: polyphony ratio, max polyphony and mean polyphony. We test DyDecNet on various datasets to show its superiority, and we further show dyadic decomposition network can be used as a general front-end to tackle other acoustic tasks.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.