-
VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections
Authors:
Dongqi Fu,
Zhigang Hua,
Yan Xie,
** Fang,
Si Zhang,
Kaan Sancak,
Hao Wu,
Andrey Malevich,
**grui He,
Bo Long
Abstract:
Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computa…
▽ More
Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jum** knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our Virtual Connection Ranking based Graph Transformer (VCR-Graphormer).
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Coherent Phonon Control of Ultrafast Magnetization Dynamics in Fe$_\text{3}$GeTe$_\text{2}$ from Time-Dependent Ab Initio Theory
Authors:
Zhaobo Zhou,
Min Li,
Thomas Frauenheim,
Junjie He
Abstract:
Exploring ultrafast magnetization control in two-dimensional (2D) magnets through optically driven coherent phonons has been well-established. Yet, the microscopic interplay between spin dynamics and lattice degrees of freedom remains less explored. Employing real-time time-dependent density functional theory (rt-TDDFT) coupled with Ehrenfest dynamics, we systematically investigate laser-induced s…
▽ More
Exploring ultrafast magnetization control in two-dimensional (2D) magnets through optically driven coherent phonons has been well-established. Yet, the microscopic interplay between spin dynamics and lattice degrees of freedom remains less explored. Employing real-time time-dependent density functional theory (rt-TDDFT) coupled with Ehrenfest dynamics, we systematically investigate laser-induced spin-nuclei dynamics with coherent phonon excitation in the 2D ferromagnet Fe3GeTe2. We found that selectively pre-exciting three typical coherent phonon modes results in up to a 53% additional spin moment loss in an out-of-plane A2 1g mode within ~50 fs. Coherent phonon control of spin dynamics is closely linked to laser pulse parameters. The underlying microscopic mechanism of this phenomenon is primarily governed by coherent phonon-induced asymmetric spin-resolved charge transfer following the disappearance of the laser pulse, thereby enabling effective control of the spin moment loss. Our findings offer a novel insight into the coupling of coherent phonons with spin systems in 2D limits on femtosecond timescales.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Advanced Deep Operator Networks to Predict Multiphysics Solution Fields in Materials Processing and Additive Manufacturing
Authors:
Shashank Kushwaha,
Jaewan Park,
Seid Koric,
Junyan He,
Iwona Jasiuk,
Diab Abueidda
Abstract:
Unlike classical artificial neural networks, which require retraining for each new set of parametric inputs, the Deep Operator Network (DeepONet), a lately introduced deep learning framework, approximates linear and nonlinear solution operators by taking parametric functions (infinite-dimensional objects) as inputs and map** them to complete solution fields. In this paper, two newly devised Deep…
▽ More
Unlike classical artificial neural networks, which require retraining for each new set of parametric inputs, the Deep Operator Network (DeepONet), a lately introduced deep learning framework, approximates linear and nonlinear solution operators by taking parametric functions (infinite-dimensional objects) as inputs and map** them to complete solution fields. In this paper, two newly devised DeepONet formulations with sequential learning and Residual U-Net (ResUNet) architectures are trained for the first time to simultaneously predict complete thermal and mechanical solution fields under variable loading, loading histories, process parameters, and even variable geometries. Two real-world applications are demonstrated: 1- coupled thermo-mechanical analysis of steel continuous casting with multiple visco-plastic constitutive laws and 2- sequentially coupled direct energy deposition for additive manufacturing. Despite highly challenging spatially variable target stress distributions, DeepONets can infer reasonably accurate full-field temperature and stress solutions several orders of magnitude faster than traditional and highly optimized finite-element analysis (FEA), even when FEA simulations are run on the latest high-performance computing platforms. The proposed DeepONet model's ability to provide field predictions almost instantly for unseen input parameters opens the door for future preliminary evaluation and design optimization of these vital industrial processes.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Geom-DeepONet: A Point-cloud-based Deep Operator Network for Field Predictions on 3D Parameterized Geometries
Authors:
Junyan He,
Seid Koric,
Diab Abueidda,
Ali Najafi,
Iwona Jasiuk
Abstract:
Modern digital engineering design process commonly involves expensive repeated simulations on varying three-dimensional (3D) geometries. The efficient prediction capability of neural networks (NNs) makes them a suitable surrogate to provide design insights. Nevertheless, few available NNs can handle solution prediction on varying 3D shapes. We present a novel deep operator network (DeepONet) varia…
▽ More
Modern digital engineering design process commonly involves expensive repeated simulations on varying three-dimensional (3D) geometries. The efficient prediction capability of neural networks (NNs) makes them a suitable surrogate to provide design insights. Nevertheless, few available NNs can handle solution prediction on varying 3D shapes. We present a novel deep operator network (DeepONet) variant called Geom-DeepONet, which encodes parameterized 3D geometries and predicts full-field solutions on an arbitrary number of nodes. To the best of the authors' knowledge, this is the first attempt in the literature and is our primary novelty. In addition to expressing shapes using mesh coordinates, the signed distance function for each node is evaluated and used to augment the inputs to the trunk network of the Geom-DeepONet, thereby capturing both explicit and implicit representations of the 3D shapes. The powerful geometric encoding capability of a sinusoidal representation network (SIREN) is also exploited by replacing the classical feedforward neural networks in the trunk with SIREN. Additional data fusion between the branch and trunk networks is introduced by an element-wise product. A numerical benchmark was conducted to compare Geom-DeepONet to PointNet and vanilla DeepONet, where results show that our architecture trains fast with a small memory footprint and yields the most accurate results among the three with less than 2 MPa stress error. Results show a much lower generalization error of our architecture on unseen dissimilar designs than vanilla DeepONet. Once trained, the model can predict vector solutions, and speed can be over $10^5$ times faster than implicit finite element simulations for large meshes.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Exploring Fermi Surface Nesting and the Nature of Heavy Quasiparticles in the Spin-Triplet Superconductor Candidate CeRh$_2$As$_2$
Authors:
Bo Chen,
Hao Liu,
Qi-Yi Wu,
Chen Zhang,
Xue-Qing Ye,
Yin-Zou Zhao,
Jiao-Jiao Song,
Xin-Yi Tian,
Ba-Lei Tan,
Zheng-Tai Liu,
Mao Ye,
Zhen-Hua Chen,
Yao-Bo Huang,
Da-Wei Shen,
Ya-Hua Yuan,
Jun He,
Yu-Xia Duan,
Jian-Qiao Meng
Abstract:
In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structur…
▽ More
In this study, we investigate the electronic structure of a spin-triplet superconductor candidate CeRh$_2$As$_2$ using high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations. Notably, Fermi surface nesting hints at connections to magnetic excitation or quadrupole density wave phenomena, elucidating the superconducting mechanisms. Measured band structures reveal primarily localized 4f electrons, with minor itinerant contributions. Additionally, a transition from localized to itinerant behavior and significant c-f hybridization anisotropy underscore the role of f-electrons in sha** electronic properties. These findings deepen our understanding of CeRh$_2$As$_2$'s unconventional superconductivity and magnetism. Further exploration promises advances in superconductivity research.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Building Optimal Neural Architectures using Interpretable Knowledge
Authors:
Keith G. Mills,
Fred X. Han,
Mohammad Salameh,
Shengyao Lu,
Chunhua Zhou,
Jiao He,
Fengyu Sun,
Di Niu
Abstract:
Neural Architecture Search is a costly practice. The fact that a search space can span a vast number of design choices with each architecture evaluation taking nontrivial overhead makes it hard for an algorithm to sufficiently explore candidate networks. In this paper, we propose AutoBuild, a scheme which learns to align the latent embeddings of operations and architecture modules with the ground-…
▽ More
Neural Architecture Search is a costly practice. The fact that a search space can span a vast number of design choices with each architecture evaluation taking nontrivial overhead makes it hard for an algorithm to sufficiently explore candidate networks. In this paper, we propose AutoBuild, a scheme which learns to align the latent embeddings of operations and architecture modules with the ground-truth performance of the architectures they appear in. By doing so, AutoBuild is capable of assigning interpretable importance scores to architecture modules, such as individual operation features and larger macro operation sequences such that high-performance neural networks can be constructed without any need for search. Through experiments performed on state-of-the-art image classification, segmentation, and Stable Diffusion models, we show that by mining a relatively small set of evaluated architectures, AutoBuild can learn to build high-quality architectures directly or help to reduce search space to focus on relevant areas, finding better architectures that outperform both the original labeled ones and ones found by search baselines. Code available at https://github.com/Ascend-Research/AutoBuild
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Exotic magnetism in perovskite KOsO3
Authors:
Jie Chen,
Hongze Li,
Javier Gainza,
Angel Munoz,
Jose A. Alonso,
Jue Liu,
Yu-Sheng Chen,
Alexei A. Belik,
Kazunari Yamaura,
Jiaming He,
Xinyu Li,
John B. Goodenough,
J. -S. Zhou
Abstract:
A new perovskite KOsO3 has been stabilized under high-pressure and high temperature conditions. It is cubic at 500 K (Pm-3m) and undergoes subsequent phase transitions to tetragonal at 320 K (P4/mmm) and rhombohedral (R-3m) at 230 K as shown from refining synchrotron X-ray powder diffraction (SXRD) data. The larger orbital overlap integral and the extended wavefunction of 5d electrons in the perov…
▽ More
A new perovskite KOsO3 has been stabilized under high-pressure and high temperature conditions. It is cubic at 500 K (Pm-3m) and undergoes subsequent phase transitions to tetragonal at 320 K (P4/mmm) and rhombohedral (R-3m) at 230 K as shown from refining synchrotron X-ray powder diffraction (SXRD) data. The larger orbital overlap integral and the extended wavefunction of 5d electrons in the perovskite KOsO3 allow to explore physics from the regime where Mott and Hund's rule couplings dominate to the state where the multiple interactions are on equal footing. We demonstrate an exotic magnetic ordering phase found by neutron powder diffraction along with physical properties via a suite of measurements including magnetic and transport properties, differential scanning calorimetry, and specific heat, which provide comprehensive information for a system at the crossover from localized to itinerant electronic behavior.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
$n$-cotorsion pairs and recollements of extriangulated categories
Authors:
Jian He,
**g He
Abstract:
In this article, we prove that if $(\mathcal A ,\mathcal B,\mathcal C)$ is a recollement of extriangulated categories, then $n$-cotorsion pairs in $\mathcal A$ and $\mathcal C$ can induce $n$-cotorsion pairs in $\mathcal B$. Conversely, this holds true under natural assumptions. Besides, we give mild conditions on a pseudo cluster tilting subcategory on the middle category of a recollement of extr…
▽ More
In this article, we prove that if $(\mathcal A ,\mathcal B,\mathcal C)$ is a recollement of extriangulated categories, then $n$-cotorsion pairs in $\mathcal A$ and $\mathcal C$ can induce $n$-cotorsion pairs in $\mathcal B$. Conversely, this holds true under natural assumptions. Besides, we give mild conditions on a pseudo cluster tilting subcategory on the middle category of a recollement of extriangulated categories, for the corresponding additive quotients to form a recollement of semi-abelian categories.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Automated Contrastive Learning Strategy Search for Time Series
Authors:
Baoyu **g,
Yansen Wang,
Guoxin Sui,
**g Hong,
**grui He,
Yuqing Yang,
Dongsheng Li,
Kan Ren
Abstract:
In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods in the literature focus on manually building specific Contrastive Learning Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually develo** CLS usually require excessive prior knowledge about the datasets and tasks, e.g., profe…
▽ More
In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods in the literature focus on manually building specific Contrastive Learning Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually develo** CLS usually require excessive prior knowledge about the datasets and tasks, e.g., professional cognition of the medical time series in healthcare, as well as huge human labor and massive experiments to determine the detailed learning configurations. In this paper, we present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns to contrastively learn representations for various time series datasets and tasks, namely Automated Contrastive Learning (AutoCL). We first construct a principled universal search space of size over 3x1012, covering data augmentation, embedding transformation, contrastive pair construction and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimizes CLS from the performance on the validation tasks, to obtain more effective CLS within the space. Experimental results on various real-world tasks and datasets demonstrate that AutoCL could automatically find the suitable CLS for a given dataset and task. From the candidate CLS found by AutoCL on several public datasets/tasks, we compose a transferable Generally Good Strategy (GGS), which has a strong performance for other datasets. We also provide empirical analysis as a guidance for future design of CLS.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Authors:
Sara Abdali,
Richard Anarfi,
CJ Barberan,
Jia He
Abstract:
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ens…
▽ More
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP). Their impact extends across a diverse spectrum of tasks, revolutionizing how we approach language understanding and generations. Nevertheless, alongside their remarkable utility, LLMs introduce critical security and risk considerations. These challenges warrant careful examination to ensure responsible deployment and safeguard against potential vulnerabilities. This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives: security and privacy concerns, vulnerabilities against adversarial attacks, potential harms caused by misuses of LLMs, mitigation strategies to address these challenges while identifying limitations of current strategies. Lastly, the paper recommends promising avenues for future research to enhance the security and risk management of LLMs.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Iterated Monodromy Group With Non-Martingale Fixed-Point Process
Authors:
Jianfei He,
Zheng Zhu
Abstract:
We construct families of rational functions $f: \mathbb{P}^1_k \rightarrow \mathbb{P}^1_k$ of degree $d \geq 2$ over a perfect field $k$ with non-martingale fixed-point processes. Then for any normal variety $X$ over $\mathbb{P}_{\bar{k}}^N$, we give conditions on $f: X \rightarrow X$ to guarantee that the associated fixed-point process is a martingale. This work extends the previous work of Bridy…
▽ More
We construct families of rational functions $f: \mathbb{P}^1_k \rightarrow \mathbb{P}^1_k$ of degree $d \geq 2$ over a perfect field $k$ with non-martingale fixed-point processes. Then for any normal variety $X$ over $\mathbb{P}_{\bar{k}}^N$, we give conditions on $f: X \rightarrow X$ to guarantee that the associated fixed-point process is a martingale. This work extends the previous work of Bridy, Jones, Kelsey, and Lodge on martingale conditions and answers their question on the existence of a non-martingale fixed-point process associated with the iterated monodromy group of a rational function.
△ Less
Submitted 23 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Authors:
Jiaao He,
Jidong Zhai
Abstract:
Cost of serving large language models (LLM) is high, but the expensive and scarce GPUs are poorly efficient when generating tokens sequentially, unless the batch of sequences is enlarged. However, the batch size is limited by some constantly reused intermediate results, namely KV-Cache. They occupy too much memory to fit more sequences into a GPU simultaneously. While they could be offloaded to ho…
▽ More
Cost of serving large language models (LLM) is high, but the expensive and scarce GPUs are poorly efficient when generating tokens sequentially, unless the batch of sequences is enlarged. However, the batch size is limited by some constantly reused intermediate results, namely KV-Cache. They occupy too much memory to fit more sequences into a GPU simultaneously. While they could be offloaded to host memory, the CPU-GPU bandwidth is an inevitable bottleneck.
We find a way to decompose the transformer models into two parts of different characteristics, one of which includes the memory-bound KV-Cache accessing. Our key insight is that the aggregated memory capacity, bandwidth, and computing power of CPUs across multiple nodes is an efficient option to process this part. Performance improvement comes from reduced data transmission overhead and boosted GPU throughput to process the other model part. Moreover, we address efficiency challenges brought by heterogeneity at both temporal and inter-device scopes using scheduling and performance modeling techniques. Evaluation results show that our system achieves 1.88x - 5.04x the throughput of vLLM when serving modern LLMs with the same GPU.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Authors:
Tianxin Wei,
Bowen **,
Ruirui Li,
Hansi Zeng,
Zhengyang Wang,
Jianhui Sun,
Qingyu Yin,
Hanqing Lu,
Suhang Wang,
**grui He,
Xianfeng Tang
Abstract:
Develo** a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater t…
▽ More
Develo** a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.
△ Less
Submitted 27 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Tracking of charged particles with nanosecond lifetimes at LHCb
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1060 additional authors not shown)
Abstract:
A method is presented to reconstruct charged particles with lifetimes between 10 ps and 10 ns, which considers a combination of their decay products and the partial tracks created by the initial charged particle. Using the $Ξ^-$ baryon as a benchmark, the method is demonstrated with simulated events and proton-proton collision data at $\sqrt{s}=13$ TeV, corresponding to an integrated luminosity of…
▽ More
A method is presented to reconstruct charged particles with lifetimes between 10 ps and 10 ns, which considers a combination of their decay products and the partial tracks created by the initial charged particle. Using the $Ξ^-$ baryon as a benchmark, the method is demonstrated with simulated events and proton-proton collision data at $\sqrt{s}=13$ TeV, corresponding to an integrated luminosity of 2.0 fb${}^{-1}$ collected with the LHCb detector in 2018. Significant improvements in the angular resolution and the signal purity are obtained. The method is implemented as part of the LHCb Run 3 event trigger in a set of requirements to select detached hyperons. This is the first demonstration of the applicability of this approach at the LHC, and the first to show its scaling with instantaneous luminosity.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation
Authors:
Ye Yuan,
Chen Zhang,
Fan Li,
Jian Chen,
Yanning Fu,
Chunhai Bai,
Xing Gao,
Yong Wang,
Tuhong Zhong,
Yixing Gao,
Liang Wang,
Donghua Chen,
Yixing Zhang,
Yang Zhang,
Wenpeng Xie,
Shupi Zhang,
Ding Liu,
Jun Cao,
Xiangdong Yin,
Xiaojun Mo,
**g Liu,
Xinru Han,
Tong Liu,
Yuqiang Chen,
Zhendong Gao
, et al. (25 additional authors not shown)
Abstract:
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by pr…
▽ More
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{μbar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$.
△ Less
Submitted 24 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Search for cosmic-ray boosted sub-MeV dark matter-electron scatterings in PandaX-4T
Authors:
Xiaofeng Shang,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Lisheng Geng,
Karl Giboni,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Junting Huang,
Zhou Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji,
Yonglin Ju,
Chenxiang Li
, et al. (67 additional authors not shown)
Abstract:
We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we…
▽ More
We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we set new constraints on DM-electron scattering cross sections for DM masses ranging from 10~eV/$c^2$ to 3~keV/$c^2$.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Coverage and Rate Analysis for Integrated Sensing and Communication Networks
Authors:
Xu Gan,
Chongwen Huang,
Zhaohui Yang,
Xiaoming Chen,
Jiguang He,
Zhaoyang Zhang,
Chau Yuen,
Yong Liang Guan,
Mérouane Debbah
Abstract:
Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochast…
▽ More
Integrated sensing and communication (ISAC) is increasingly recognized as a pivotal technology for next-generation cellular networks, offering mutual benefits in both sensing and communication capabilities. This advancement necessitates a re-examination of the fundamental limits within networks where these two functions coexist via shared spectrum and infrastructures. However, traditional stochastic geometry-based performance analyses are confined to either communication or sensing networks separately. This paper bridges this gap by introducing a generalized stochastic geometry framework in ISAC networks. Based on this framework, we define and calculate the coverage and ergodic rate of sensing and communication performance under resource constraints. Then, we shed light on the fundamental limits of ISAC networks by presenting theoretical results for the coverage rate of the unified performance, taking into account the coupling effects of dual functions in coexistence networks. Further, we obtain the analytical formulations for evaluating the ergodic sensing rate constrained by the maximum communication rate, and the ergodic communication rate constrained by the maximum sensing rate. Extensive numerical results validate the accuracy of all theoretical derivations, and also indicate that denser networks significantly enhance ISAC coverage. Specifically, increasing the base station density from $1$ $\text{km}^{-2}$ to $10$ $\text{km}^{-2}$ can boost the ISAC coverage rate from $1.4\%$ to $39.8\%$. Further, results also reveal that with the increase of the constrained sensing rate, the ergodic communication rate improves significantly, but the reverse is not obvious.
△ Less
Submitted 22 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Two-dimensional phase diagram of the charge density wave in doped CsV$_3$Sb$_5$
Authors:
Linwei Huai,
Hongyu Li,
Yulei Han,
Yang Luo,
Shuting Peng,
Zhiyuan Wei,
Jianchang Shen,
Bingqian Wang,
Yu Miao,
Xiupeng Sun,
Zhipeng Ou,
Bo Liu,
Xiaoxiao Yu,
Ziji Xiang,
Min-Quan Kuang,
Zhenhua Qiao,
Xianhui Chen,
Junfeng He
Abstract:
Kagome superconductors AV$_3$Sb$_5$ (A = K, Rb and Cs) have attracted much recent attention due to the coexistence of multiple exotic orders. Among them, the charge density wave (CDW) order has been shown to host various unconventional behaviors. Here, we investigate the CDW order by a combination of both bulk and surface do** methods. While element substitutions in bulk do** change both carri…
▽ More
Kagome superconductors AV$_3$Sb$_5$ (A = K, Rb and Cs) have attracted much recent attention due to the coexistence of multiple exotic orders. Among them, the charge density wave (CDW) order has been shown to host various unconventional behaviors. Here, we investigate the CDW order by a combination of both bulk and surface do** methods. While element substitutions in bulk do** change both carriers and the crystal lattice, the surface do** primarily tunes the carrier concentration. As such, our results reveal a two-dimensional phase diagram of the CDW in doped CsV$_3$Sb$_5$. In the lightly bulk doped regime, the existence of CDW order is reversible by tuning the carrier concentration. But excessive bulk do** permanently destroys the CDW, regardless of the carrier do** level. These results provide insights to the origin of the CDW from both electronic and structural degrees of freedom. They also open an avenue for manipulating the exotic CDW order in Kagome superconductors.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
RA-ICM: A Novel Independent Cascade Model Incorporating User Relationships and Attitudes
Authors:
Xinyu Li,
Yutong Guo,
Jixuan He,
Jiacheng Zhao,
Chenwei Wang
Abstract:
The rapid development of social networks has a wide range of social effects, which facilitates the study of social issues. Accurately forecasting the information propagation process within social networks is crucial for promptly understanding the event direction and effectively addressing social problems in a scientific manner. The relationships between non-adjacent users and the attitudes of user…
▽ More
The rapid development of social networks has a wide range of social effects, which facilitates the study of social issues. Accurately forecasting the information propagation process within social networks is crucial for promptly understanding the event direction and effectively addressing social problems in a scientific manner. The relationships between non-adjacent users and the attitudes of users significantly influence the information propagation process within social networks. However, existing research has ignored these two elements, which poses challenges for accurately predicting the information propagation process. This limitation significantly hinders the study of emotional contagion and influence maximization in social networks. To address these issues, by considering the relationships between non-adjacent users and the influence of user attitudes, we propose a new information propagation model based on the independent cascade model. Experimental results obtained from six real Weibo datasets validate the effectiveness of the proposed model, which is reflected in increased prediction accuracy and reduced time complexity. Furthermore, the information dissemination trend in social networks predicted by the proposed model closely resembles the actual information propagation process, which demonstrates the superiority of the proposed model.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Probing Image Compression For Class-Incremental Learning
Authors:
Justin Yang,
Zhihao Duan,
Andrew Peng,
Yuning Huang,
Jiangpeng He,
Fengqing Zhu
Abstract:
Image compression emerges as a pivotal tool in the efficient handling and transmission of digital images. Its ability to substantially reduce file size not only facilitates enhanced data storage capacity but also potentially brings advantages to the development of continual machine learning (ML) systems, which learn new knowledge incrementally from sequential data. Continual ML systems often rely…
▽ More
Image compression emerges as a pivotal tool in the efficient handling and transmission of digital images. Its ability to substantially reduce file size not only facilitates enhanced data storage capacity but also potentially brings advantages to the development of continual machine learning (ML) systems, which learn new knowledge incrementally from sequential data. Continual ML systems often rely on storing representative samples, also known as exemplars, within a limited memory constraint to maintain the performance on previously learned data. These methods are known as memory replay-based algorithms and have proven effective at mitigating the detrimental effects of catastrophic forgetting. Nonetheless, the limited memory buffer size often falls short of adequately representing the entire data distribution. In this paper, we explore the use of image compression as a strategy to enhance the buffer's capacity, thereby increasing exemplar diversity. However, directly using compressed exemplars introduces domain shift during continual ML, marked by a discrepancy between compressed training data and uncompressed testing data. Additionally, it is essential to determine the appropriate compression algorithm and select the most effective rate for continual ML systems to balance the trade-off between exemplar quality and quantity. To this end, we introduce a new framework to incorporate image compression for continual ML including a pre-processing data compression step and an efficient compression rate/algorithm selection method. We conduct extensive experiments on CIFAR-100 and ImageNet datasets and show that our method significantly improves image classification accuracy in continual ML settings.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Detecting Neutrinos from Supernova Bursts in PandaX-4T
Authors:
Binyu Pang,
Abdusalam Abdukerim,
Zihao Bo,
Wei Chen,
Xun Chen,
Chen Cheng,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Changbo Fu,
Mengting Fu,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Yanlin Huang,
Junting Huang,
Zhou Huang,
Ruquan Hou
, et al. (71 additional authors not shown)
Abstract:
Neutrinos from core-collapse supernovae are essential for the understanding of neutrino physics and stellar evolution. The dual-phase xenon dark matter detectors can provide a way to track explosions of galactic supernovae by detecting neutrinos through coherent elastic neutrino-nucleus scatterings. In this study, a variation of progenitor masses as well as explosion models are assumed to predict…
▽ More
Neutrinos from core-collapse supernovae are essential for the understanding of neutrino physics and stellar evolution. The dual-phase xenon dark matter detectors can provide a way to track explosions of galactic supernovae by detecting neutrinos through coherent elastic neutrino-nucleus scatterings. In this study, a variation of progenitor masses as well as explosion models are assumed to predict the neutrino fluxes and spectra, which result in the number of expected neutrino events ranging from 6.6 to 13.7 at a distance of 10 kpc over a 10-second duration with negligible backgrounds at PandaX-4T. Two specialized triggering alarms for monitoring supernova burst neutrinos are built. The efficiency of detecting supernova explosions at various distances in the Milky Way is estimated. These alarms will be implemented in the real-time supernova monitoring system at PandaX-4T in the near future, providing the astronomical communities with supernova early warnings.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Solution-Hashing Search Based on Layout-Graph Transformation for Unequal Circle Packing
Authors:
Jianrong Zhou,
Jiyao He,
Kun He
Abstract:
The problem of packing unequal circles into a circular container stands as a classic and challenging optimization problem in computational geometry. This study introduces a suite of innovative and efficient methods to tackle this problem. Firstly, we present a novel layout-graph transformation method that represents configurations as graphs, together with an inexact hash method facilitating fast c…
▽ More
The problem of packing unequal circles into a circular container stands as a classic and challenging optimization problem in computational geometry. This study introduces a suite of innovative and efficient methods to tackle this problem. Firstly, we present a novel layout-graph transformation method that represents configurations as graphs, together with an inexact hash method facilitating fast comparison of configurations for isomorphism or similarity. Leveraging these advancements, we propose an Iterative Solution-Hashing Search algorithm adept at circumventing redundant exploration through efficient configuration recording. Additionally, we introduce several enhancements to refine the optimization and search processes, including an adaptive adjacency maintenance method, an efficient vacancy detection technique, and a Voronoi-based locating method. Through comprehensive computational experiments across various benchmark instances, our algorithm demonstrates superior performance over existing state-of-the-art methods, showcasing remarkable applicability and versatility. Notably, our algorithm surpasses the best-known results for 56 out of 179 benchmark instances while achieving parity with the remaining instances.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Hashing Beam Training for Near-Field Communications
Authors:
Yuan Xu,
Li Wei,
Chongwen Huang,
Chen Zhu,
Zhaohui Yang,
Jun Yang,
Jiguang He,
Zhaoyang Zhang,
Mérouane Debbah
Abstract:
In this paper, we investigate the millimeter-wave (mmWave) near-field beam training problem to find the correct beam direction. In order to address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first design a set of sparse bases based on the…
▽ More
In this paper, we investigate the millimeter-wave (mmWave) near-field beam training problem to find the correct beam direction. In order to address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first design a set of sparse bases based on the polar domain sparsity of the near-field channel. Then, the random hash functions are chosen to construct the near-field multi-arm beam training codebook. Each multi-arm beam codeword is scanned in a time slot until all the predefined codewords are traversed. Finally, the soft decision and voting methods are applied to distinguish the signal from different base stations and obtain correctly aligned beams. Simulation results show that our proposed near-field HMB training method can reduce the beam training overhead to the logarithmic level, and achieve 96.4% identification accuracy of exhaustive beam training. Moreover, we also verify applicability under the far-field scenario.
△ Less
Submitted 9 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Stochastic Geometry Analysis for Distributed RISs-Assisted mmWave Communications
Authors:
Yuan Xu,
Li Wei,
Chongwen Huang,
Yongxu Zhu,
Zhaohui Yang,
Jun Yang,
Jiguang He,
Zhaoyang Zhang,
Mérouane Debbah
Abstract:
Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the im…
▽ More
Millimeter wave (mmWave) has attracted considerable attention due to its wide bandwidth and high frequency. However, it is highly susceptible to blockages, resulting in significant degradation of the coverage and the sum rate. A promising approach is deploying distributed reconfigurable intelligent surfaces (RISs), which can establish extra communication links. In this paper, we investigate the impact of distributed RISs on the coverage probability and the sum rate in mmWave wireless communication systems. Specifically, we first introduce the system model, which includes the blockage, the RIS and the user distribution models, leveraging the Poisson point process. Then, we define the association criterion and derive the conditional coverage probabilities for the two cases of direct association and reflective association through RISs. Finally, we combine the two cases using Campbell's theorem and the total probability theorem to obtain the closed-form expressions for the ergodic coverage probability and the sum rate. Simulation results validate the effectiveness of the proposed analytical approach, demonstrating that the deployment of distributed RISs significantly improves the ergodic coverage probability by 45.4% and the sum rate by over 1.5 times.
△ Less
Submitted 9 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Electromagnetic Hybrid Beamforming for Holographic Communications
Authors:
Ran Ji,
Chongwen Huang,
Xiaoming Chen,
Wei E. I. Sha,
Linglong Dai,
Jiguang He,
Zhaoyang Zhang,
Chau Yuen,
Mérouane Debbah
Abstract:
It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. S…
▽ More
It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. Specifically, EHB consists of antenna excitation current vectors (analog beamforming) and digital precoding matrices, where the implementation of analog beamforming involves the real-time adjustment of the radiation pattern to adapt it to the dynamic wireless environment. Meanwhile, the digital beamforming is optimized based on the channel characteristics of analog beamforming to further improve the achievable rate of communication systems. An electromagnetic channel model incorporating array radiation patterns and the mutual coupling effect is also developed to evaluate the benefits of our proposed scheme. Simulation results demonstrate that our proposed EHB scheme with a 3D holographic array achieves a relatively flat superdirective beamforming gain and allows for programmable focusing directions throughout the entire spatial domain. Furthermore, they also verify that the proposed scheme achieves a sum rate gain of over 150% compared to traditional beamforming algorithms.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text
Authors:
Sara Abdali,
Richard Anarfi,
CJ Barberan,
Jia He
Abstract:
Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them,…
▽ More
Large Language Models (LLMs) have revolutionized the field of Natural Language Generation (NLG) by demonstrating an impressive ability to generate human-like text. However, their widespread usage introduces challenges that necessitate thoughtful examination, ethical scrutiny, and responsible practices. In this study, we delve into these challenges, explore existing strategies for mitigating them, with a particular emphasis on identifying AI-generated text as the ultimate solution. Additionally, we assess the feasibility of detection from a theoretical perspective and propose novel research directions to address the current limitations in this domain.
△ Less
Submitted 26 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception
Authors:
Xiang Huang,
Zhi-Qi Cheng,
Jun-Yan He,
Chenyang Li,
Wangmeng Xiang,
Baigui Sun,
Xiao Wu
Abstract:
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception. To address this critical need, this paper introduces Dynamic Routering Network (DyRoNet), a low-rank enhanced dynamic routing framework designed for streaming perception in autonomous driving systems. DyRoNet integrates a suite of pre-trained branch networks, each meticulously f…
▽ More
The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception. To address this critical need, this paper introduces Dynamic Routering Network (DyRoNet), a low-rank enhanced dynamic routing framework designed for streaming perception in autonomous driving systems. DyRoNet integrates a suite of pre-trained branch networks, each meticulously fine-tuned to function under distinct environmental conditions. At its core, the framework offers a speed router module, developed to assess and route input data to the most suitable branch for processing. This approach not only addresses the inherent limitations of conventional models in adapting to diverse driving conditions but also ensures the balance between performance and efficiency. Extensive experimental evaluations demonstrating the adaptability of DyRoNet to diverse branch selection strategies, resulting in significant performance enhancements across different scenarios. This work not only establishes a new benchmark for streaming perception but also provides valuable engineering insights for future work.
△ Less
Submitted 18 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Constraining Mass Transfer Models with Galactic Neutron Star$-$White Dwarf Binaries as Gravitational Wave Sources
Authors:
Jian-Guo He,
Yong Shao,
Xiao-Jie Xu,
Xiang-Dong Li
Abstract:
Neutron star$-$white dwarf (NSWD) binaries are one of the most abundant sources of gravitational waves (GW) in the Milky Way. These GW sources are the evolutionary products of primordial binaries that experienced many processes of binary interaction. We employ a binary population synthesis method to investigate the properties of Galactic NSWD binaries detectable by the Laser Interferometer Space A…
▽ More
Neutron star$-$white dwarf (NSWD) binaries are one of the most abundant sources of gravitational waves (GW) in the Milky Way. These GW sources are the evolutionary products of primordial binaries that experienced many processes of binary interaction. We employ a binary population synthesis method to investigate the properties of Galactic NSWD binaries detectable by the Laser Interferometer Space Antenna (LISA). In this paper, only the NSWD systems with a COWD or ONeWD component are included. We consider various models related to mass transfer efficiencies during primordial binary evolution, supernova explosion mechanisms at NS formation, common envelope ejection efficiencies, and critical WD masses that determining the stability of mass transfer between WDs and NSs. Based on our calculations, we estimate that tens to hundreds of LISA NSWD binaries exist in the Milky Way. We find that the detection of LISA NSWD binaries is able to provide profound insights into mass transfer efficiencies during the evolution of primordial binaries and critical WD masses during mass transfer from a WD to an NS.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks
Authors:
Bingkun Lai,
Jiayi He,
Jiawen Kang,
Gaolei Li,
Minrui Xu,
Tao zhang,
Shengli Xie
Abstract:
Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things. Federated learning is a promising technique for effectively training GAI models in mobile edge networks due to its data distribution. However, there is a notable issue with communication consumption when training large GAI model…
▽ More
Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things. Federated learning is a promising technique for effectively training GAI models in mobile edge networks due to its data distribution. However, there is a notable issue with communication consumption when training large GAI models like generative diffusion models in mobile edge networks. Additionally, the substantial energy consumption associated with training diffusion-based models, along with the limited resources of edge devices and complexities of network environments, pose challenges for improving the training efficiency of GAI models. To address this challenge, we propose an on-demand quantized energy-efficient federated diffusion approach for mobile edge networks. Specifically, we first design a dynamic quantized federated diffusion training scheme considering various demands from the edge devices. Then, we study an energy efficiency problem based on specific quantization requirements. Numerical results show that our proposed method significantly reduces system energy consumption and transmitted model size compared to both baseline federated diffusion and fixed quantized federated diffusion methods while effectively maintaining reasonable quality and diversity of generated data.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Signal Response Model in PandaX-4T
Authors:
Yunyang Luo,
Zihao Bo,
Shibo Zhang,
Abdusalam Abdukerim,
Chen Cheng,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Changbo Fu,
Mengting Fu,
Lisheng Geng,
Karl Giboni,
Linhui Gu,
Xuyuan Guo,
Chencheng Han,
Ke Han,
Changda He,
**rong He,
Di Huang,
Yanlin Huang,
Zhou Huang
, et al. (66 additional authors not shown)
Abstract:
PandaX-4T experiment is a deep-underground dark matter direct search experiment that employs a dual-phase time projection chamber with a sensitive volume containing 3.7 tonne of liquid xenon. The detector of PandaX-4T is capable of simultaneously collecting the primary scintillation and ionization signals, utilizing their ratio to discriminate dark matter signals from background sources such as ga…
▽ More
PandaX-4T experiment is a deep-underground dark matter direct search experiment that employs a dual-phase time projection chamber with a sensitive volume containing 3.7 tonne of liquid xenon. The detector of PandaX-4T is capable of simultaneously collecting the primary scintillation and ionization signals, utilizing their ratio to discriminate dark matter signals from background sources such as gamma rays and beta particles. The signal response model plays a crucial role in interpreting the data obtained by PandaX-4T. It describes the conversion from the deposited energy by dark matter interactions to the detectable signals within the detector. The signal response model is utilized in various PandaX-4T results. This work provides a comprehensive description of the procedures involved in constructing and parameter-fitting the signal response model for the energy range of approximately 1 keV to 25 keV for electronic recoils and 6 keV to 90 keV for nuclear recoils. It also covers the signal reconstruction, selection, and correction methods, which are crucial components integrated into the signal response model.
△ Less
Submitted 14 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Astrochemical effect of the fundamental grain surface processes I. The diffusion of grain surface species and the pre-exponential factor
Authors:
Long-Fei Chen,
Donghui Quan,
Jiao He,
Yao Wang,
Di Li,
Thomas Henning
Abstract:
Abbreviation. Thermal diffusion is one of the basic processes for the mobility and formation of species on cosmic dust grains. Recent laboratory measurements have found that the diffusion pre-exponential factor can differ from that for desorption by several orders of magnitude. We aim to evaluate the effect of the newly experimentally measured diffusion pre-exponential factor on the chemistry unde…
▽ More
Abbreviation. Thermal diffusion is one of the basic processes for the mobility and formation of species on cosmic dust grains. Recent laboratory measurements have found that the diffusion pre-exponential factor can differ from that for desorption by several orders of magnitude. We aim to evaluate the effect of the newly experimentally measured diffusion pre-exponential factor on the chemistry under cold molecular cloud conditions. We found that statistically, more than half of the total gas-phase and grain surface species are not affected by the new pre-exponential factor after a chemical evolution of 10$^5$ yr. The most abundant gas-phase CO and grain surface water ice are not affected by the new pre-exponential factor. For the grain surface species that are affected, compared to the commonly adopted value of the pre-exponential factor for diffusion used in the chemical models, they could be either overproduced or underproduced with the lower diffusion pre-factor used in this work. The former case applies to radicals and the species that serve as reactants, while the latter case applies to complex organic molecules (COMs) on the grain and the species that rarely react with other species. Gas-phase species could also be affected due to the desorption of the grain surface species. The abundance of some gas-phase COMs could be varied by over one order of magnitude depending on the adopted grain surface temperature and/or the ratio of diffusion to desorption energy in the model. Key species whose diffusion pre-exponential factor significantly affects the model predictions were also evaluated, and these specie include CH3OH, H2CO, and NO. The results presented in this study show that the pre-exponential factor is one of the basic and important parameters in astrochemical models.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection
Authors:
Jianfeng He,
Hang Su,
Jason Cai,
Igor Shalyminov,
Hwanjun Song,
Saab Mansour
Abstract:
Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to…
▽ More
Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to SSDS, as it is a generative task, and each dialogue can be summarized in different ways. In this work, we propose a novel scoring approach, SiCF, which encapsulates three primary dimensions of summarization model quality: Semantic invariance (indicative of model confidence), Coverage (factual recall), and Faithfulness (factual precision). Using the SiCF score, we select unlabeled dialogues with high-quality generated summaries to train summarization models. Comprehensive experiments on three public datasets demonstrate the effectiveness of SiCF scores in uncertainty estimation and semi-supervised learning for dialogue summarization tasks. Our code is available at \url{https://github.com/amazon-science/summarization-sicf-score}.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Amplitude analysis of the $Λ_b^0\to pK^-γ$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
A. Alfonso Albero,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1084 additional authors not shown)
Abstract:
The resonant structure of the radiative decay $Λ_b^0\to pK^-γ$ in the region of proton-kaon invariant-mass up to 2.5 GeV$/c^2$ is studied using proton-proton collision data recorded at centre-of-mass energies of 7, 8, and 13 TeV collected with the LHCb detector, corresponding to a total integrated luminosity of 9 fb$^{-1}$. Results are given in terms of fit and interference fractions between the d…
▽ More
The resonant structure of the radiative decay $Λ_b^0\to pK^-γ$ in the region of proton-kaon invariant-mass up to 2.5 GeV$/c^2$ is studied using proton-proton collision data recorded at centre-of-mass energies of 7, 8, and 13 TeV collected with the LHCb detector, corresponding to a total integrated luminosity of 9 fb$^{-1}$. Results are given in terms of fit and interference fractions between the different components contributing to this final state. Only $Λ$ resonances decaying to $pK^-$ are found to be relevant, where the largest contributions stem from the $Λ(1520)$, $Λ(1600)$, $Λ(1800)$, and $Λ(1890)$ states.
△ Less
Submitted 21 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
First observation of the $Λ^0_b \to D^+ D^- Λ$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
J. A. Adams,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey
, et al. (1068 additional authors not shown)
Abstract:
The $Λ^0_b \to D^+ D^- Λ$ decay is observed for the first time using proton-proton collision data collected by the LHCb experiment at a center-of-mass energy of $13 \mathrm{TeV}$, corresponding to an integrated luminosity of $5.3 \mathrm{fb}^{-1}$. Using the $B^0 \to D^+ D^- K_{\mathrm{S}}^0$ decay as a reference channel, the product of the relative production cross-section and decay branching fra…
▽ More
The $Λ^0_b \to D^+ D^- Λ$ decay is observed for the first time using proton-proton collision data collected by the LHCb experiment at a center-of-mass energy of $13 \mathrm{TeV}$, corresponding to an integrated luminosity of $5.3 \mathrm{fb}^{-1}$. Using the $B^0 \to D^+ D^- K_{\mathrm{S}}^0$ decay as a reference channel, the product of the relative production cross-section and decay branching fractions is measured to be $$ {\cal R}=\frac{σ_{Λ^0_b}}{σ_{B^0}} \times \frac{{\cal B}(Λ^0_b \to D^+ D^- Λ)}{{\cal B}(B^0 \to D^+ D^- K_{\mathrm{S}}^0)}=0.179 \pm 0.022 \pm 0.014 $$ where the first uncertainty is statistical and the second is systematic. The known branching fraction of the reference channel, ${\cal B}(B^0 \to D^+ D^- K_{\mathrm{S}}^0)$, and the cross-section ratio, $σ_{Λ^0_b} / σ_{B^0}$, previously measured by $\mathrm{LHCb}$ are used to derive the branching fraction of the $Λ^0_b \to D^+ D^- Λ$ decay $$ {\cal B}(Λ^0_b \to D^+ D^- Λ)=(1.24 \pm 0.15 \pm 0.10 \pm 0.28 \pm 0.11) \times 10^{-4}, $$ where the third and fourth contributions are due to uncertainties of ${\cal B}(B^0 \to D^+ D^- K_{\mathrm{S}}^0)$ and $σ_{Λ^0_b} / σ_{B^0}$, respectively. Inspection of the $D^+ Λ$ and $D^+ D^-$ invariant-mass distributions suggests a rich presence of intermediate resonances in the decay. The $Λ^0_b \to D^{*+} D^- Λ$ decay is also observed for the first time as a partially reconstructed component in the $D^+ D^- Λ$ invariant mass spectrum.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Authors:
Junwen He,
Yifan Wang,
Lijun Wang,
Huchuan Lu,
Jun-Yan He,
**-Peng Lan,
Bin Luo,
Xuansong Xie
Abstract:
Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks. Recent efforts have been made to equip MLLMs with visual perceiving and grounding capabilities. However, there still remains a gap in providing fine-grained pixel-level perceptions and extending interactions beyond text-specific inputs. In this work, we propose {\bf{A…
▽ More
Multimodal Large Language Model (MLLMs) leverages Large Language Models as a cognitive framework for diverse visual-language tasks. Recent efforts have been made to equip MLLMs with visual perceiving and grounding capabilities. However, there still remains a gap in providing fine-grained pixel-level perceptions and extending interactions beyond text-specific inputs. In this work, we propose {\bf{AnyRef}}, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references, such as texts, boxes, images, or audio. This innovation empowers users with greater flexibility to engage with the model beyond textual and regional prompts, without modality-specific designs. Through our proposed refocusing mechanism, the generated grounding output is guided to better focus on the referenced object, implicitly incorporating additional pixel-level supervision. This simple modification utilizes attention scores generated during the inference of LLM, eliminating the need for extra computations while exhibiting performance enhancements in both grounding masks and referring expressions. With only publicly available training data, our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
△ Less
Submitted 25 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement
Authors:
**hong He,
Minglong Xue,
Aoxiang Ning,
Chengyun Song
Abstract:
Diffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilize…
▽ More
Diffusion model-based low-light image enhancement methods rely heavily on paired training data, leading to limited extensive application. Meanwhile, existing unsupervised methods lack effective bridging capabilities for unknown degradation. To address these limitations, we propose a novel zero-reference lighting estimation diffusion model for low-light image enhancement called Zero-LED. It utilizes the stable convergence ability of diffusion models to bridge the gap between low-light domains and real normal-light domains and successfully alleviates the dependence on pairwise training data via zero-reference learning. Specifically, we first design the initial optimization network to preprocess the input image and implement bidirectional constraints between the diffusion model and the initial optimization network through multiple objective functions. Subsequently, the degradation factors of the real-world scene are optimized iteratively to achieve effective light enhancement. In addition, we explore a frequency-domain based and semantically guided appearance reconstruction module that encourages feature alignment of the recovered image at a fine-grained level and satisfies subjective expectations. Finally, extensive experiments demonstrate the superiority of our approach to other state-of-the-art methods and more significant generalization capabilities. We will open the source code upon acceptance of the paper.
△ Less
Submitted 9 July, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Spin Signature of Majorana Fermions in Topological Nodal-Point Superconductors
Authors:
Junjie Zeng,
James Jun He,
Zhen Ning,
Dong-Hui Xu,
Rui Wang
Abstract:
In two-dimensional topological nodal superconductors, Majorana edge states have been conventionally believed to exhibit only spin-triplet pairing correlations. However, we reveal a substantial spin-singlet pairing component in Majorana edge states of antiferromagnetic topological nodal-point superconductors. This unexpected phenomenon emerges from the interplay between antiferromagnetic order and…
▽ More
In two-dimensional topological nodal superconductors, Majorana edge states have been conventionally believed to exhibit only spin-triplet pairing correlations. However, we reveal a substantial spin-singlet pairing component in Majorana edge states of antiferromagnetic topological nodal-point superconductors. This unexpected phenomenon emerges from the interplay between antiferromagnetic order and symmetry, resulting in Majorana edge states with a nearly flat band dispersion, deviating from the strictly flat band. Crucially, this phenomenon is detectable through spin-selective Andreev reflection, where the zero-bias conductance peaks are maximized when the spin of incident electrons is nearly antiparallel to that of Majorana edge excitations. This discovery unveils a unique spin signature for Andreev reflection resonances, advancing our fundamental understanding of spin-dependent mechanisms in topological superconductivity and representing a significant step towards the experimental detection of Majorana fermions.
△ Less
Submitted 5 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Authors:
Shiqi Chen,
Miao Xiong,
Junteng Liu,
Zhengxuan Wu,
Teng Xiao,
Siyang Gao,
Junxian He
Abstract:
Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in…
▽ More
Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the ``sharpness'' among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.
△ Less
Submitted 12 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
A far-ultraviolet-driven photoevaporation flow observed in a protoplanetary disk
Authors:
Olivier Berné,
Emilie Habart,
Els Peeters,
Ilane Schroetter,
Amélie Canin,
Ameek Sidhu,
Ryan Chown,
Emeric Bron,
Thomas J. Haworth,
Pamela Klaassen,
Boris Trahin,
Dries Van De Putte,
Felipe Alarcón,
Marion Zannese,
Alain Abergel,
Edwin A. Bergin,
Jeronimo Bernard-Salas,
Christiaan Boersma,
Jan Cami,
Sara Cuadrado,
Emmanuel Dartois,
Daniel Dicken,
Meriem Elyajouri,
Asunción Fuente,
Javier R. Goicoechea
, et al. (121 additional authors not shown)
Abstract:
Most low-mass stars form in stellar clusters that also contain massive stars, which are sources of far-ultraviolet (FUV) radiation. Theoretical models predict that this FUV radiation produces photo-dissociation regions (PDRs) on the surfaces of protoplanetary disks around low-mass stars, impacting planet formation within the disks. We report JWST and Atacama Large Millimetere Array observations of…
▽ More
Most low-mass stars form in stellar clusters that also contain massive stars, which are sources of far-ultraviolet (FUV) radiation. Theoretical models predict that this FUV radiation produces photo-dissociation regions (PDRs) on the surfaces of protoplanetary disks around low-mass stars, impacting planet formation within the disks. We report JWST and Atacama Large Millimetere Array observations of a FUV-irradiated protoplanetary disk in the Orion Nebula. Emission lines are detected from the PDR; modelling their kinematics and excitation allows us to constrain the physical conditions within the gas. We quantify the mass-loss rate induced by the FUV irradiation, finding it is sufficient to remove gas from the disk in less than a million years. This is rapid enough to affect giant planet formation in the disk.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Thermal stabilities Landscape of A$_2$BB$^{\prime}$O$_6$ compounds
Authors:
Yateng Wang,
Bianca Baldassarri,
Jiahong Shen,
Jiangang He,
Chris Wolverton
Abstract:
Perovskite oxides have been extensively studied for their wide range of compositions and structures, as well as their valuable properties for various applications. Expanding from single perovskite ABO$_3$ to double perovskite $A_2BB^{\prime}$O$_6$ significantly enhances the ability to tailor specific physical and chemical properties. However, the vast number of potential compositions of…
▽ More
Perovskite oxides have been extensively studied for their wide range of compositions and structures, as well as their valuable properties for various applications. Expanding from single perovskite ABO$_3$ to double perovskite $A_2BB^{\prime}$O$_6$ significantly enhances the ability to tailor specific physical and chemical properties. However, the vast number of potential compositions of $A_2BB^{\prime}$O$_6$ makes it impractical to explore them all experimentally. In this study, we conducted high-throughput calculations to systematically investigate the structures and stabilities of 4,900 $A_2BB^{\prime}$O$_6$ compositions (with $A$ = Ca, Sr, Ba, and La; $B$ and $B^{\prime}$ representing metal elements) through over 42,000 density functional theory (DFT) calculations. Our analysis lead to the discovery of more than 1,500 new synthesizable $A_2BB^{\prime}$O$_6$ compounds, with over 1,100 of them exhibiting double perovskite structures, predominantly in the $P2_1/c$ space group. By leveraging the high-throughput dataset, we developed machine learning models that achieved mean absolute errors of 0.0444 and 0.0330 eV/atom for formation energy and decomposition energy, respectively. Using these models, we identified 803 stable or metastable compositions beyond the chemical space covered in our initial calculations, with 612 of them having DFT-validated decomposition energies below 0.1 eV/atom, resulting in a success rate of 76.2 \%. This study delineates the stability landscape of $A_2BB^{\prime}$O$_6$ compounds and offers new insights for the exploration of these materials.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Quantum coherence and entanglement under the influence of decoherence
Authors:
Wen-Yang Sun,
A-Min Ding,
Juan He,
Jiadong Shi,
Le Wang,
Hui-Fang Xu,
Dong Wang,
Liu Ye
Abstract:
In this work, we delve into the dynamic traits of the relative entropy of quantum coherence (REQC) as the quantum system interacts with the different noisy channels, drawing comparisons with entanglement (concurrence). The research results demonstrate the broader prevalence and stronger robustness of the REQC as opposed to concurrence. It's worth noting that the bit flip channel cannot uphold a co…
▽ More
In this work, we delve into the dynamic traits of the relative entropy of quantum coherence (REQC) as the quantum system interacts with the different noisy channels, drawing comparisons with entanglement (concurrence). The research results demonstrate the broader prevalence and stronger robustness of the REQC as opposed to concurrence. It's worth noting that the bit flip channel cannot uphold a constant nonzero frozen the REQC, besides, the concurrence follows a pattern of temporary reduction to zero, followed by recovery after a certain time span. More importantly, the REQC maintains its presence consistently until reaching a critical threshold, whereas concurrence experiences completely attenuation to zero under the influence of phase dam** and amplitude dam** channels.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Towards Backward-Compatible Continual Learning of Image Compression
Authors:
Zhihao Duan,
Ming Lu,
Justin Yang,
Jiangpeng He,
Zhan Ma,
Fengqing Zhu
Abstract:
This paper explores the possibility of extending the capability of pre-trained neural image compressors (e.g., adapting to new data or target bitrates) without breaking backward compatibility, the ability to decode bitstreams encoded by the original model. We refer to this problem as continual learning of image compression. Our initial findings show that baseline solutions, such as end-to-end fine…
▽ More
This paper explores the possibility of extending the capability of pre-trained neural image compressors (e.g., adapting to new data or target bitrates) without breaking backward compatibility, the ability to decode bitstreams encoded by the original model. We refer to this problem as continual learning of image compression. Our initial findings show that baseline solutions, such as end-to-end fine-tuning, do not preserve the desired backward compatibility. To tackle this, we propose a knowledge replay training strategy that effectively addresses this issue. We also design a new model architecture that enables more effective continual learning than existing baselines. Experiments are conducted for two scenarios: data-incremental learning and rate-incremental learning. The main conclusion of this paper is that neural image compressors can be fine-tuned to achieve better performance (compared to their pre-trained version) on new data and rates without compromising backward compatibility. Our code is available at https://gitlab.com/viper-purdue/continual-compression
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Authors:
Jiangpeng He,
Fengqing Zhu
Abstract:
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance…
▽ More
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance), and (ii) severe class imbalances within each individual task (intra-phase imbalance). We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL. Our method addresses it by reweighting the gradients towards balanced optimization and unbiased classifier learning. Additionally, we observe imbalanced forgetting where paradoxically the instance-rich classes suffer higher performance degradation during CIL due to a larger amount of training data becoming unavailable in subsequent learning phases. To tackle this, we further introduce a distribution-aware knowledge distillation loss to mitigate forgetting by aligning output logits proportionally with the distribution of lost training data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across various evaluation protocols and demonstrate consistent improvements compared to existing works, showing great potential to apply CIL in real-world scenarios with enhanced robustness and effectiveness.
△ Less
Submitted 29 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Towards Generalist Prompting for Large Language Models by Mental Models
Authors:
Haoxiang Guan,
Jiyan He,
Shuxin Zheng,
En-Hong Chen,
Weiming Zhang,
Nenghai Yu
Abstract:
Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to intr…
▽ More
Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance on a wide range of tasks while eliminating the need for manual selection and customization of prompts tailored to specific problems. Furthermore, we propose MeMo (Mental Models), an innovative prompting method that is simple-designed yet effectively fulfills the criteria of generalist prompting. MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings. We hope that the insights presented herein will stimulate further exploration of generalist prompting methods for LLMs.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine
Authors:
Xiaosong Wang,
Xiaofan Zhang,
Guotai Wang,
Junjun He,
Zhongyu Li,
Wentao Zhu,
Yi Guo,
Qi Dou,
Xiaoxiao Li,
Dequan Wang,
Liang Hong,
Qicheng Lao,
Tong Ruan,
Yukun Zhou,
Yixue Li,
Jie Zhao,
Kang Li,
Xin Sun,
Lifeng Zhu,
Shaoting Zhang
Abstract:
The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas. However, domain-specific applications of such foundation models (e.g., in medicine) remain untouched or often at their very early stages. It will require an individual set of transfer learning…
▽ More
The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas. However, domain-specific applications of such foundation models (e.g., in medicine) remain untouched or often at their very early stages. It will require an individual set of transfer learning and model adaptation techniques by further expanding and injecting these models with domain knowledge and data. The development of such technologies could be largely accelerated if the bundle of data, algorithms, and pre-trained foundation models were gathered together and open-sourced in an organized manner. In this work, we present OpenMEDLab, an open-source platform for multi-modality foundation models. It encapsulates not only solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications but also building domain-specific foundation models with large-scale multi-modal medical data. Importantly, it opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc. Inspiring and competitive results are also demonstrated for each collected approach and model in a variety of benchmarks for downstream tasks. We welcome researchers in the field of medical artificial intelligence to continuously contribute cutting-edge methods and models to OpenMEDLab, which can be accessed via https://github.com/openmedlab.
△ Less
Submitted 3 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
On finite totally k-closed groups
Authors:
Jiawei He,
Xiaogang Li
Abstract:
Let $G$ be a finite group acting faithfully on a finite set $Ω$. For a positive integer $k$, $G$ acts naturally on the Catesian product $Ω^k := Ω\times ...\times Ω$. In this paper, we prove that finite nilpotent group $G$ with $2\nmid |G|$ is a totally $k$-closed group if and only if $G$ is abelian with $n(G)\leq k-1$ or cyclic, where $n(G)$ is the number of invariant factors in the invariant fact…
▽ More
Let $G$ be a finite group acting faithfully on a finite set $Ω$. For a positive integer $k$, $G$ acts naturally on the Catesian product $Ω^k := Ω\times ...\times Ω$. In this paper, we prove that finite nilpotent group $G$ with $2\nmid |G|$ is a totally $k$-closed group if and only if $G$ is abelian with $n(G)\leq k-1$ or cyclic, where $n(G)$ is the number of invariant factors in the invariant factor decomposition of $G$.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Acceptor-induced bulk dielectric loss in superconducting circuits on silicon
Authors:
Zi-Huai Zhang,
Kadircan Godeneli,
Justin He,
Mutasem Odeh,
Haoxin Zhou,
Srujan Meesala,
Alp Sipahigil
Abstract:
The performance of superconducting quantum circuits is primarily limited by dielectric loss due to interactions with two-level systems (TLS). State-of-the-art circuits with engineered material interfaces are approaching a limit where dielectric loss from bulk substrates plays an important role. However, a microscopic understanding of dielectric loss in crystalline substrates is still lacking. In t…
▽ More
The performance of superconducting quantum circuits is primarily limited by dielectric loss due to interactions with two-level systems (TLS). State-of-the-art circuits with engineered material interfaces are approaching a limit where dielectric loss from bulk substrates plays an important role. However, a microscopic understanding of dielectric loss in crystalline substrates is still lacking. In this work, we show that boron acceptors in silicon constitute a strongly coupled TLS bath for superconducting circuits. We discuss how the electronic structure of boron acceptors leads to an effective TLS response in silicon. We sweep the boron concentration in silicon and demonstrate the bulk dielectric loss limit from boron acceptors. We show that boron-induced dielectric loss can be reduced in a magnetic field due to the spin-orbit structure of boron. This work provides the first detailed microscopic description of a TLS bath for superconducting circuits, and demonstrates the need for ultrahigh purity substrates for next-generation superconducting quantum processors.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
The endomorphism rings of permutation modules of $\frac{3}{2}$-transitive permutation groups
Authors:
Jiawei He,
Xiaogang Li
Abstract:
Recent classification of $\frac{3}{2}$-transitive permutation groups leaves us with six families of groups which are $2$-transitive, or Frobenius, or one-dimensional affine, or the affine solvable subgroups of $ \mathrm{AGL}(2, q)$, or special projective linear group $\mathrm{PSL}(2, q)$, or $\mathrm{PΓL}(2, q)$, where $q=2^p $ with $p$ prime. According to a case by case analysis, we prove that th…
▽ More
Recent classification of $\frac{3}{2}$-transitive permutation groups leaves us with six families of groups which are $2$-transitive, or Frobenius, or one-dimensional affine, or the affine solvable subgroups of $ \mathrm{AGL}(2, q)$, or special projective linear group $\mathrm{PSL}(2, q)$, or $\mathrm{PΓL}(2, q)$, where $q=2^p $ with $p$ prime. According to a case by case analysis, we prove that the endomorphism ring of the natural permutation module for a $\frac{3}{2}$-transitive permutation group is a symmetric algebra.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Magnetic resonance delta radiomics to track radiation response in lung tumors receiving stereotactic MRI-guided radiotherapy
Authors:
Yining Zha,
Benjamin H. Kann,
Zezhong Ye,
Anna Zapaishchykova,
John He,
Shu-Hui Hsu,
Jonathan E. Leeman,
Kelly J. Fitzgerald,
David E. Kozono,
Raymond H. Mak,
Hugo J. W. L. Aerts
Abstract:
Introduction: Lung cancer is a leading cause of cancer-related mortality, and stereotactic body radiotherapy (SBRT) has become a standard treatment for early-stage lung cancer. However, the heterogeneous response to radiation at the tumor level poses challenges. Currently, standardized dosage regimens lack adaptation based on individual patient or tumor characteristics. Thus, we explore the potent…
▽ More
Introduction: Lung cancer is a leading cause of cancer-related mortality, and stereotactic body radiotherapy (SBRT) has become a standard treatment for early-stage lung cancer. However, the heterogeneous response to radiation at the tumor level poses challenges. Currently, standardized dosage regimens lack adaptation based on individual patient or tumor characteristics. Thus, we explore the potential of delta radiomics from on-treatment magnetic resonance (MR) imaging to track radiation dose response, inform personalized radiotherapy dosing, and predict outcomes. Methods: A retrospective study of 47 MR-guided lung SBRT treatments for 39 patients was conducted. Radiomic features were extracted using Pyradiomics, and stability was evaluated temporally and spatially. Delta radiomics were correlated with radiation dose delivery and assessed for associations with tumor control and survival with Cox regressions. Results: Among 107 features, 49 demonstrated temporal stability, and 57 showed spatial stability. Fifteen stable and non-collinear features were analyzed. Median Skewness and surface to volume ratio decreased with radiation dose fraction delivery, while coarseness and 90th percentile values increased. Skewness had the largest relative median absolute changes (22%-45%) per fraction from baseline and was associated with locoregional failure (p=0.012) by analysis of covariance. Skewness, Elongation, and Flatness were significantly associated with local recurrence-free survival, while tumor diameter and volume were not. Conclusions: Our study establishes the feasibility and stability of delta radiomics analysis for MR-guided lung SBRT. Findings suggest that MR delta radiomics can capture short-term radiographic manifestations of intra-tumoral radiation effect.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.