Applying Deep Learning Technique to Chiral Magnetic Wave Search

Yuan-Sheng Zhao Physics Department and Center for Particle Physics and Field Theory, Fudan University, Shanghai 200433, China    Xu-Guang Huang [email protected] Physics Department and Center for Particle Physics and Field Theory, Fudan University, Shanghai 200433, China Key Laboratory of Nuclear Physics and Ion-beam Application (MOE), Fudan University, Shanghai 200433, China Shanghai Research Center for Theoretical Nuclear Physics, National Natural Science Foundation of China and Fudan University, Shanghai 200438, China
(July 1, 2024)
Abstract

The chiral magnetic wave (CMW) is a collective mode in quark-gluon plasma originated from the chiral magnetic effect (CME) and chiral separation effect. Its detection in heavy-ion collisions is challenging due to significant background contamination. In Ref. Zhao et al. (2022), we have constructed a neural network which can accurately identify the CME-related signal from the final-state pion spectra. In this paper, we generalize such a neural network to the case of CMW search. We show that, after a updated training, the neural network can effectively recognize the CMW-related signal. Additionally, we assess the performance of the neural network compared to other known methods for CMW search.

Keywords: Deep learning, Chiral magnetic wave, relativistic heavy-ion collisions

I Introduction

The interplay between the chiral anomaly and external electromagnetic or vortical fields can lead to intriguing anomalous transport phenomena in many-body systems with chiral fermions. A notable example is the chiral magnetic effect (CME) Kharzeev et al. (2008); Fukushima et al. (2008), which induces an electric current aligned with an external magnetic field. In heavy-ion collisions, the CME may cause charge separation relative to the reaction plane, which can potentially be observed by analyzing the azimuthal-angle distribution of charged hadrons using specific observables Voloshin (2004); Abelev et al. (2009). Other notable anomalous transports include the chiral separation effect (CSE) Son and Zhitnitsky (2004); Metlitski and Zhitnitsky (2005), the chiral vortical effect Erdmenger et al. (2009); Banerjee et al. (2011); Son and Surowka (2009); Landsteiner et al. (2011), and the chiral electric separation effect (CESE) Huang and Liao (2013); Jiang et al. (2015). For reviews, see Refs Huang (2016); Kharzeev et al. (2016); Liu and Huang (2020); Kharzeev and Liao (2021); Hattori et al. (2022).

In the presence of an external magnetic field, the coupled evolution of CME and CSE gives rise to a gapless collective mode known as the chiral magnetic wave (CMW) Kharzeev and Yee (2011). The CMW can transfer both chirality and electric charge, potentially resulting in distinct charge and chirality distributions. In heavy-ion collisions, given that the fireball contains a small amount of positive charges inherited from the colliding nuclei, theoretical studies have suggested that the CMW can induce a charge quadrupole in the fireball, with an accumulation of positive charges at the tips and negative charges around the equator. As the fireball expands, this quadrupole leads to an imbalance in the elliptic flow of charged pions, specifically, v2(π)>v2(π+)subscript𝑣2superscript𝜋subscript𝑣2superscript𝜋v_{2}(\pi^{-})>v_{2}(\pi^{+})italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) Burnier et al. (2011). Owing to event-by-event fluctuation of charges, some events could have net negative charges in the fireball thus leading to v2(π)<v2(π+)subscript𝑣2superscript𝜋subscript𝑣2superscript𝜋v_{2}(\pi^{-})<v_{2}(\pi^{+})italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) < italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_π start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ). This characteristic feature of CMW provides a method to detect it in heavy-ion collisions, and a series of experiments have found signals of charged pion elliptic flow consistent with CMW expectations Adamczyk et al. (2015); Adam et al. (2016); Sirunyan et al. (2019); Abdulhamid et al. (2023); Acharya et al. (2023). However, like CME, CMW in heavy-ion collisions faces strong background noise Deng and Huang (2012); Stephanov and Yee (2013); Dunlop et al. (2011); Bzdak and Bozek (2013); Hatta et al. (2016); Xu et al. (2012); Hattori and Huang (2017), which significantly obscures the observables designed for CMW detection.

In Ref. Zhao et al. (2022), we developed a CME-meter based on convolutional neural network (CNNs) (For reviews of deep learning techniques in nuclear physics, see Refs. Boehnlein et al. (2022); He et al. (2023a, b); Zhou et al. (2024)). After training this CME-meter with AMPT-generated data simulating CME (by introducing initial charge separation into AMPT model Lin et al. (2005)) for Au + Au collisions at 200 GeV, the CME-meter demonstrated exceptional robustness in distinguishing events with CME from those without. Additionally, the CME-meter maintained strong performance across different charge separation fractions, collision energies, and collision systems. This success suggests the potential for creating a similar CMW-meter. As an extension of our earlier work, we aim to increase the upper limit of salience at the cost of some generalization capability. This approach could pave the way for future studies on CMW physics and its detection.

In this paper, we report on the construction and performance of such a CMW-meter. Section II details the training process, including the generation of training samples using AMPT, the structure of the neural network, and the training procedure. Section III examines the analysis of the trained model, including its basic properties, comparisons to flows and observables, and a hypothesis test. Section IV provides a summary of our findings.

II Construction and training of the CMW-meter

In this section, we introduce the deep learning model, data set preparation, and training strategies employed in constructing the CMW-meter. The pion spectra of heavy-ion collision final states serve as the input of this deep learning model. Pions carry most of the electric charges in the final state, making them an appropriate representation of charge distribution. A convolutional neural network (CNN) is utilized, trained within a supervised learning scheme to find out the CMW signals. The training data is generated from the string-melting AMPT model Lin et al. (2005), a transport model which is widely used to simulate the evolution of both partonic and hadronic matter in heavy-ion collisions.

In order to incorporate the CMW effect into the AMPT model, we adopt a global charge quadrupole scheme introduced in Ref. Ma (2014). For an AMPT event with Ach>0.01subscript𝐴ch0.01A_{\rm ch}>-0.01italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT > - 0.01, we interchange the positions of certain u𝑢uitalic_u (or d¯¯𝑑\bar{d}over¯ start_ARG italic_d end_ARG) quarks in the initial state with those of u¯¯𝑢\bar{u}over¯ start_ARG italic_u end_ARG (or d𝑑ditalic_d) quarks if the former are relatively farther from the reaction plane(RP), while the opposite is done for events with Ach<0.01subscript𝐴ch0.01A_{\rm ch}<-0.01italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT < - 0.01. Here, Achsubscript𝐴chA_{\rm ch}italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT stands for the asymmetry of charged particle number, given by Ach=(N+N)/(N++N)subscript𝐴chsuperscript𝑁superscript𝑁superscript𝑁superscript𝑁A_{\rm ch}=(N^{+}-N^{-})/(N^{+}+N^{-})italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT = ( italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) / ( italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ), where N+superscript𝑁N^{+}italic_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT denotes the number of positively charged particles measured in a given event, and Nsuperscript𝑁N^{-}italic_N start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT denotes negatively charged particles. The RP of all events is set in the zOxlimit-from𝑧𝑂𝑥zOx-italic_z italic_O italic_x -plane. The fraction of particles that are interchanged is represented by a relative percentage with respect to the total number of quarks,

f=# Exchanged particles# All particles.𝑓# Exchanged particles# All particlesf=\frac{\text{\# Exchanged particles}}{\text{\# All particles}}.italic_f = divide start_ARG # Exchanged particles end_ARG start_ARG # All particles end_ARG . (1)

According to previous study Ma (2014), switching f=23%𝑓2percent3f=2-3\%italic_f = 2 - 3 % of quarks generates a CMW signal comparable to experimental observables. For training and validation purposes, we thus choose events with a f=2%𝑓percent2f=2\%italic_f = 2 % switching fraction. The transition point, Ach=0.01subscript𝐴ch0.01A_{\rm ch}=-0.01italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT = - 0.01 in this scheme is based on STAR experimental results Adamczyk et al. (2015), where one may find more details. Events at sNN=200subscript𝑠𝑁𝑁200\sqrt{s_{NN}}=200\,square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG = 200GeV and different centrality are generated for training and validation.

There are two primary reasons for training a model that results in bias and overfitting at 200 GeV. Firstly, the pivotal issue pertains to the occurrence of CMW in heavy-ion collisions, rather than the magnitude of the signal. Consequently, any technique that can distinctly distinguish CMW signals from background noise is considered valuable, irrespective of the sNNsubscript𝑠𝑁𝑁\sqrt{s_{NN}}square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG or event centrality. Secondly, our research on the application of neural networks for CME detection Zhao et al. (2022) confirmed the robustness of the trained network against variations in collision energy and event centrality. The training was successful on the most comprehensive dataset, demonstrating high accuracy levels. This means only small variance in the network’s detection feasibility was made by them. Therefore, a model trained on a single energy is capable of enhancing the signal detection in certain events, while still maintaining a considerable degree of generalization. However, further examinations involving various energies and centralities have also been conducted to provide a more nuanced analysis.

Refer to caption
Figure 1: A VGG-like network Simonyan and Zisserman (2015) with 4 hidden layers is chosen in this study. Batch normalization(BN) is applied after each hidden layer. The convolutional layers are modified to satisfy the periodic boundary condition of the input data, and each followed by an average pooling. 10% dropout is set for the second to last dense layers.

The structure of CNN used in this work is shown in Fig. 1. There are three 2D-convolutional layers and two dense layer that contain parameters to be fit. Some pooling layers are also applied here, providing proper reduction to data and kee** the network from being too complicated. To encode “knowledge” about CMW in the model, samples with and without CMW, labeled ‘1’ and ‘0’ separately, are fed to it during training, and the model is set to classify these samples. The last activation function of the network is SoftMax, which returns a pair of numbers (P0,P1)subscript𝑃0subscript𝑃1(P_{0},\,P_{1})( italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for this binary classification problem. Samples with P1>0.5subscript𝑃10.5P_{1}>0.5italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0.5 are divided into class ‘1’, the else into class ‘0’. Therefore, P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be interpreted as the probability that a specific sample is recognized by the neural network as containing the CMW signal.

Data pre-processing involves several steps aimed at converting events into analyzable samples. Initially, the spectra for mid-rapidity (|η|<1𝜂1|\eta|<1| italic_η | < 1) pions, denoted as ρ±(pT,ϕ)superscript𝜌plus-or-minussubscript𝑝𝑇italic-ϕ\rho^{\pm}(p_{T},\phi)italic_ρ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_ϕ ), are calculated, with the symbol ±plus-or-minus\pm± representing either π+superscript𝜋\pi^{+}italic_π start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT or πsuperscript𝜋\pi^{-}italic_π start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, while pTsubscript𝑝𝑇p_{T}italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT denotes the transverse momentum in the range of 02020-20 - 2 GeV and ϕitalic-ϕ\phiitalic_ϕ indicates the azimuthal angle. These spectra are then segmented into histograms consisting of 20 by 24 bins. Second, every spectrum is normalized itself so that the sum of all bins is 1. Subsequently, a random selection of events is made, and for each type of pion, their spectra are averaged bin by bin. These resulting normalized and averaged pion spectra serve as the datasets for the neural network’s training, validation, and testing phases. Unless otherwise stated, the number of events we take in the last step is 100 in the rest of this article. The model’s training encompasses 250 epochs, within which each epoch contains 64 batches, and each batch comprises 100 samples. In total, 1.6 million samples are generated for the training.

III Performance of the CMW-meter

Accuracy, robustness and extrapolations.— As mentioned above, the model is trained (and also validated) on samples generated at sNN=200subscript𝑠𝑁𝑁200\sqrt{s_{NN}}=200\,square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG = 200GeV that mimic final-state CMW behavior. It turns out that the model reaches high accuracy on most events with signal at different sNNsubscript𝑠𝑁𝑁\sqrt{s_{NN}}square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG and centrality, as is shown in Fig. 2, which indicates preferable generalization of the trained model. Reduction of accuracy can be found at low collision energy and large centrality, of which the reason can be various. Different pattern of CMW, weaker signals, stronger backgrounds or just overfitting, all of them can account for the reduction of accuracy.

Refer to caption
Figure 2: Accuracy of the trained model on samples (a) with CMW, and (b) without CMW. Samples from various sNNsubscript𝑠𝑁𝑁\sqrt{s_{NN}}square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG and centralities are considered. The accuracy is remarkably high if the signal is surely encoded in the samples, while those without can be mistaken as containing it, especially at lower energy and more peripheral cases.

One of the ways to detect CMW in experiments is based on the dependency of charge distribution and flow analysis. Specifically, the linear order dependency of Achsubscript𝐴chA_{\rm ch}italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT of the difference in charged-particle elliptic flow,

Δv2v2v2+rAch,Δsubscript𝑣2superscriptsubscript𝑣2superscriptsubscript𝑣2similar-to-or-equals𝑟subscript𝐴ch\Delta v_{2}\equiv v_{2}^{-}-v_{2}^{+}\simeq rA_{\rm ch},roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ≃ italic_r italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT , (2)

gives a measure for the CMW signal. Here v2superscriptsubscript𝑣2v_{2}^{-}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and v2+superscriptsubscript𝑣2v_{2}^{+}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT are elliptic flows of negative- and positive-charge particles separately, and the slope r𝑟ritalic_r is related to the strength of the signal. Experimental results from the STAR experiment for Au + Au collisions Abdulhamid et al. (2023) indicate that the uncertainty of the π𝜋\piitalic_π slope r𝑟ritalic_r increases at lower energies and higher centralities. Although the neural network is trained on pions from a larger kinematic window than used in experimental analyses, suggesting improved completeness and distinguishing capability, its performance aligns with traditional statistical analysis trends (Δv2(π)Δsubscript𝑣2𝜋\Delta v_{2}(\pi)roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_π )). The decrease in accuracy is likely due to strong backgrounds in those scenarios comparing to the signal strength. However, a new model can be trained using low-energy samples or combined with high-energy samples to create a more comprehensive training set, enhancing robustness. This approach will be for a future, more detailed study. Overall, the trained model’s accuracy in decoding the CMW signal is sufficient across all tested sNNsubscript𝑠𝑁𝑁\sqrt{s_{NN}}square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG levels, especially when focusing on high-energy samples.

As a potential detector for CMW, a measure of performance is the prediction on non-labeled samples, where accuracy cannot be defined and the sample by sample output becomes important. The two components of the model output are identified as probabilities, i.e., P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT stands for probability that the neural network regards the input spectrum to be with CMW, and P0subscript𝑃0P_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the probability for the other class so the two components satisfy P0+P1=1subscript𝑃0subscript𝑃11P_{0}+P_{1}=1italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1. It is obvious that a positive correlation of P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with CMW signal is expected. Events with different initial charge quadrupole fraction f𝑓fitalic_f are simulated, and prepared into samples as mentioned above. Tests on these samples keep the high true-positive accuracy, yet the returned P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for all f𝑓fitalic_f are close to 1 so that little difference is made among them. To make a clear comparison, we enlarge the difference of their output by introducing an additional logit function,

logit(x)=logx1x.logit𝑥log𝑥1𝑥\text{logit}(x)=\text{log}\frac{x}{1-x}\,.logit ( italic_x ) = log divide start_ARG italic_x end_ARG start_ARG 1 - italic_x end_ARG . (3)

The logit function is the inverse function of SoftMax, thus acting logit to P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reveals feature space information that is encoded in the neural network one layer before output. Besides, the logit function is monotonically increasing, so logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) keeps the correlation of P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and f𝑓fitalic_f qualitatively.

Refer to caption
Figure 3: Distribution of logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) on events @ sNN=200subscript𝑠𝑁𝑁200\sqrt{s_{NN}}=200square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG = 200 GeV, centrality 30-40%. Tests are done on events with different initial charge quadrupole(f=1,2,3,4%𝑓123percent4f=1,2,3,4\%italic_f = 1 , 2 , 3 , 4 %). The distribution are normalized to 1.

In Fig. 3, we present the outcomes with varying initial charge quadrupoles. As the f𝑓fitalic_f increases, the peak of the logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) distribution shifts to the right. In cases where f𝑓fitalic_f equals 4%, P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT approaches 1 so closely that the logit function becomes numerically unstable with single precision calculations. However, the pattern of the f=4%𝑓percent4f=4\%italic_f = 4 % distribution is still in line with the general trend. Additionally, the width of the peak remains essentially unaffected by f𝑓fitalic_f, indicating that the model introduces minimal error and reliably extracts the expected CMW signal. The width of the peak is due to the event-to-event initial-state fluctuations and the method of implementing the initial charge quadrupole. The reasonable extrapolation of P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for various f𝑓fitalic_f values suggests that the CMW strength for f𝑓fitalic_f has been correctly aligned to P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by the neural network. Consequently, it is also indicative of the CMW signal intensity.

The model is also validated in some other tests. In test on no-CMW events generated by UrQMD, it classifies most events correctly into ‘0’ class. To see whether CME signal effects this CMW detector, a test set consists of AMPT events with CME is prepared, and the trained model gives negative prediction mostly.

Comparison with observables.— In above, we demonstrate that the trained model efficiently decodes CMW information from ρ±(pT,ϕ)superscript𝜌plus-or-minussubscript𝑝𝑇italic-ϕ\rho^{\pm}(p_{T},\phi)italic_ρ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_ϕ ), providing a potential measure of CMW in heavy-ion collisions. However, further comparisons with experimental observables are necessary before constructing a measurement based on the model. In Fig.3, we show that logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is correlated with f𝑓fitalic_f, which in turn has a positive correlation with the slope r𝑟ritalic_r. In addition to the slope, the following covariance between vnsubscript𝑣𝑛v_{n}italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and q3subscript𝑞3q_{3}italic_q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, which is essentially a three-particle correlator, is another noteworthy observable Adam et al. (2016),

λnvnq3q3vn,subscript𝜆𝑛delimited-⟨⟩subscript𝑣𝑛subscript𝑞3delimited-⟨⟩subscript𝑞3delimited-⟨⟩subscript𝑣𝑛\lambda_{n}\equiv\langle v_{n}q_{3}\rangle-\langle q_{3}\rangle\langle v_{n}\rangle,italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≡ ⟨ italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩ - ⟨ italic_q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩ ⟨ italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟩ , (4)

where vnsubscript𝑣𝑛v_{n}italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the n-th harmonic flow of the event, q3subscript𝑞3q_{3}italic_q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT the charge of the third particle, and delimited-⟨⟩\langle\cdots\rangle⟨ ⋯ ⟩ denotes event average. The differential three-particle correlator, which measures the correlation between the flow at a particular kinematic region, and the charge of the third particle at another particular coordinate, is more convenient when comparing across experiments as no correction for efficiency is needed. In the following we set n=2𝑛2n=2italic_n = 2 for correlation with the elliptic flow. Using v2v¯2±rAch/2proportional-tosuperscriptsubscript𝑣2minus-or-plusplus-or-minussubscript¯𝑣2𝑟subscript𝐴ch2v_{2}^{\mp}\propto\bar{v}_{2}\pm rA_{\rm ch}/2italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∓ end_POSTSUPERSCRIPT ∝ over¯ start_ARG italic_v end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ± italic_r italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT / 2 and Achq3similar-tosubscript𝐴chdelimited-⟨⟩subscript𝑞3A_{\rm ch}\sim\langle q_{3}\rangleitalic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT ∼ ⟨ italic_q start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⟩, one notice that λ2±r(Ach2Ach2)/2subscript𝜆2plus-or-minus𝑟delimited-⟨⟩superscriptsubscript𝐴ch2superscriptdelimited-⟨⟩subscript𝐴ch22\lambda_{2}\approx\pm r(\langle A_{\rm ch}^{2}\rangle-\langle A_{\rm ch}% \rangle^{2})/2italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≈ ± italic_r ( ⟨ italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ - ⟨ italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / 2 for positive-charge/negative-charge cases. In the following λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is obtaiend by making half the difference between the positive-charge and negative-charge cases.

Refer to caption
Figure 4: Distribution of logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) on events for Au + Au at sNN=200subscript𝑠𝑁𝑁200\sqrt{s_{NN}}=200square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG = 200 GeV and centrality 30-40%. Events are devided into logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) bins, and their λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are averaged separately. Events are all embed with initial charge quadrupole. A range of logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is chosen where most events are included to avoid statistical minority. The three-particle correlator demonstrate clearly a positive correlation with logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT).

Figure 4 shows the results of the comparison between logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The average λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of events increase gently as the response of the model gets stronger. Knowing logit(P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) is positvely correlated to CMW signal, it indicates a reasonable trend in λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT when signal gets stronger. This agrees with early studies on λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Adam et al. (2016), and also shows the model prediction is qualified for measurement.

Performance under backgrounds is necessary before we go further. There are several mechanisms that may cause final-state Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and Achsubscript𝐴chA_{\rm ch}italic_A start_POSTSUBSCRIPT roman_ch end_POSTSUBSCRIPT dependency as discussed in Refs. Deng and Huang (2012); Stephanov and Yee (2013); Dunlop et al. (2011); Bzdak and Bozek (2013); Hatta et al. (2016); Xu et al. (2012); Hattori and Huang (2017). To see how the trained neural network works under such backgrounds, the prediction of the neural network with different Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT range (either with or without initial charge quadrupoles) is studied, and the results are shown in Fig. 5. For input sample without CMW, the prediction increases when events with larger absolute Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are chosen. This shows that the neural network tends to regard events with larger Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as events containing CMW signals although they do not actually include CMW signals. However, it should be emphasized that even in this situation, P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is still less than 0.50.50.50.5, meaning that the neural network still correctly classify them as events without CMW. For samples with CMW, the model shows strong robustness against the background, and the model makes classification correctly for all the samples.

Refer to caption
Figure 5: Distribution of P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on events at sNN=200subscript𝑠𝑁𝑁200\sqrt{s_{NN}}=200square-root start_ARG italic_s start_POSTSUBSCRIPT italic_N italic_N end_POSTSUBSCRIPT end_ARG = 200 GeV against Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Events are devided in to 10 Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bins, and their P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are averaged separately. As the magnitude of Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT increases, the tendency of the model to make a false positive classification also increases. Nevertheless, in events involving an initial quadrupole, the model consistently maintains a high level of accuracy.

Hypothesis test.— As previously discussed, the neural network demonstrates good accuracy in predicting the CMW signal and exhibits robustness across different collision energies, centralities, and background effects after training. This makes it feasible to create a CMW-meter based on the neural network. However, the need to average events poses a challenge for deploying this measurement experimentally, as it is not possible to know the charge quadrupole pattern in advance or align events according to their charge distribution patterns.

However, from a hypothesis test perspective, the CMW-meter also holds experimental feasibility. For a fixed finite number, M𝑀Mitalic_M, of events, one can assume the presence of a sufficiently large residual quadrupole that can be detected through our meter if CMW is believed to exist in these events. Conversely, if no CMW is observed in experiments, the neural network’s predictions will consistently fall within the ‘0’ class. As demonstrated in Fig. 3, the intensity of the CMW signal significantly alters the distribution of logit(P1)subscript𝑃1(P_{1})( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) or P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, thus influencing the distribution of P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT itself (denoted by P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )). This distribution responds differently depending on the presence or absence of CMW in the data set. If CMW exists in the heavy-ion collisions, the neural network model’s prediction regarding the residual quadrupole of a sample will align with P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) in the f0𝑓0f\neq 0italic_f ≠ 0 case. In contrast, without any CMW, the distribution will match the f=0𝑓0f=0italic_f = 0 scenario. To establish a reasonable estimation of P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for testing M𝑀Mitalic_M events, we treat f𝑓fitalic_f as a latent variable representing CMW in a single event, as defined by the initial charge quadrupole fraction used in this work. Given event-by-event fluctuations, we model f𝑓fitalic_f as a random variable following a Gaussian distribution, fN(μ,σ2)similar-to𝑓𝑁𝜇superscript𝜎2f\sim N(\mu,\sigma^{2})italic_f ∼ italic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where μ𝜇\muitalic_μ is the mean of the latent variable f𝑓fitalic_f, which is expected to be around 0. The variance σ𝜎\sigmaitalic_σ is estimated based on Ma (2014), where the average of |f|𝑓|f|| italic_f | is approximately 2%percent22\%2 %,

2%=|f|percent2delimited-⟨⟩𝑓\displaystyle 2\%=\langle|f|\rangle2 % = ⟨ | italic_f | ⟩ =|f|N>(|f|;σ2)d|f|,absent𝑓superscript𝑁𝑓superscript𝜎2𝑑𝑓\displaystyle=\int|f|N^{>}(|f|;\sigma^{2})\;d|f|,= ∫ | italic_f | italic_N start_POSTSUPERSCRIPT > end_POSTSUPERSCRIPT ( | italic_f | ; italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d | italic_f | , (5)

here N>(|f|;σ2)superscript𝑁𝑓superscript𝜎2N^{>}(|f|;\sigma^{2})italic_N start_POSTSUPERSCRIPT > end_POSTSUPERSCRIPT ( | italic_f | ; italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) is the half normal distribution, and |f|𝑓|f|| italic_f | is a positive-definite variable because the model prediction is independent to the sign of f𝑓fitalic_f. Solving (5) gives σ0.025similar-to-or-equals𝜎0.025\sigma\simeq 0.025italic_σ ≃ 0.025. Because we adopt averaged events in preparing the CMW-meter, the way to compose a ρ(feff)𝜌subscript𝑓eff\rho(f_{\text{eff}})italic_ρ ( italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) from single events {ρ(fi)}𝜌subscript𝑓𝑖\{\rho(f_{i})\}{ italic_ρ ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } becomes crucial, where feffsubscript𝑓efff_{\text{eff}}italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT is the effective charge quadrupole rate of averaged events. One can choose the arithmetic mean as,

1MiMρ(fi)=ρ(1MiMfi)=ρ(feff).1𝑀superscriptsubscript𝑖𝑀𝜌subscript𝑓𝑖𝜌1𝑀superscriptsubscript𝑖𝑀subscript𝑓𝑖𝜌subscript𝑓eff\frac{1}{M}\sum_{i}^{M}\rho(f_{i})=\rho(\frac{1}{M}\sum_{i}^{M}f_{i})=\rho(f_{% \text{eff}}).divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_ρ ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ρ ( divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ρ ( italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) . (6)

Therefore, the distribution of |feff|subscript𝑓eff|f_{\text{eff}}|| italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT | can be achieved as

FeffN(μ/M,Mσ2/M2)=N(0,σ2/M),similar-tosubscript𝐹eff𝑁𝜇𝑀𝑀superscript𝜎2superscript𝑀2𝑁0superscript𝜎2𝑀F_{\text{eff}}\sim N({\mu}/{M},M\,{\sigma^{2}}/{M^{2}})=N(0,{\sigma^{2}}/{M}),italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ∼ italic_N ( italic_μ / italic_M , italic_M italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_M ) , (7)

with Feff|feff|N>(σ2/M)subscript𝐹effsubscript𝑓effsimilar-tosuperscript𝑁superscript𝜎2𝑀F_{\text{eff}}\equiv|f_{\text{eff}}|\sim N^{>}(\sigma^{2}/M)italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ≡ | italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT | ∼ italic_N start_POSTSUPERSCRIPT > end_POSTSUPERSCRIPT ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_M ). The conditional probability P(P1|Feff)Pconditionalsubscript𝑃1subscript𝐹eff\text{P}(P_{1}|\;F_{\text{eff}})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) can be approximated as a Beta distribution,

Be(x;α,β)=Γ(α+β)Γ(α)Γ(β)xα1(1x)β1,Be𝑥𝛼𝛽Γ𝛼𝛽Γ𝛼Γ𝛽superscript𝑥𝛼1superscript1𝑥𝛽1\text{Be}(x;\alpha,\beta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(% \beta)}x^{\alpha-1}(1-x)^{\beta-1},Be ( italic_x ; italic_α , italic_β ) = divide start_ARG roman_Γ ( italic_α + italic_β ) end_ARG start_ARG roman_Γ ( italic_α ) roman_Γ ( italic_β ) end_ARG italic_x start_POSTSUPERSCRIPT italic_α - 1 end_POSTSUPERSCRIPT ( 1 - italic_x ) start_POSTSUPERSCRIPT italic_β - 1 end_POSTSUPERSCRIPT , (8)

with α𝛼\alphaitalic_α and β𝛽\betaitalic_β the parameters of the Beta distribution, and ΓΓ\Gammaroman_Γ is Gamma function. To describe P(P1|Feff)Pconditionalsubscript𝑃1subscript𝐹eff\text{P}(P_{1}|\;F_{\text{eff}})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) at any Feffsubscript𝐹effF_{\text{eff}}italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT, we assume α𝛼\alphaitalic_α and β𝛽\betaitalic_β are functions of Feffsubscript𝐹effF_{\text{eff}}italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT, and fit several sets of (α,β)𝛼𝛽(\alpha,\beta)( italic_α , italic_β ) from the fitted beta distribution with polynomial(for α𝛼\alphaitalic_α) and Softplus (for β𝛽\betaitalic_β, to reach proper asymptotic behavior around P1=1subscriptP11\text{P}_{1}=1P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1).

After parameterizing P(P1|Feff)Pconditionalsubscript𝑃1subscript𝐹eff\text{P}(P_{1}|\;F_{\text{eff}})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ), P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is derived as

P(P1)Psubscript𝑃1\displaystyle\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) =P(P1|Feff)P(Feff)𝑑FeffabsentPconditionalsubscript𝑃1subscript𝐹effPsubscript𝐹effdifferential-dsubscript𝐹eff\displaystyle=\int\text{P}(P_{1}|\,F_{\text{eff}})\text{P}(F_{\text{eff}})\,dF% _{\text{eff}}= ∫ P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) P ( italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) italic_d italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT (9)
=0Be(P1;α(Feff),β(Feff))N>(σ2M)𝑑Feff.absentsuperscriptsubscript0Besubscript𝑃1𝛼subscript𝐹eff𝛽subscript𝐹effsuperscript𝑁superscript𝜎2𝑀differential-dsubscript𝐹eff\displaystyle=\int_{0}^{\infty}\text{Be}(P_{1};\alpha(F_{\text{eff}}),\beta(F_% {\text{eff}}))N^{>}(\frac{\sigma^{2}}{M})\,dF_{\text{eff}}.= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT Be ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_α ( italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) , italic_β ( italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ) ) italic_N start_POSTSUPERSCRIPT > end_POSTSUPERSCRIPT ( divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M end_ARG ) italic_d italic_F start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT .

The numerical results are shown in Fig. 6. P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) of the “existing CMW” has an obvious rise around P1=1subscript𝑃11P_{1}=1italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 compared to the “no CMW” case, which suggests a non-zero probability of composing a large residual quadrupole. With a smaller M𝑀Mitalic_M, the width of feffsubscript𝑓efff_{\text{eff}}italic_f start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT becomes larger, which allows one to get a visible P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In Fig. 6 we also present results of randomly mixing events of both charge quadrupole patterns generated by AMPT, where M=𝑀absentM=italic_M = 25. For both large- and small-P1subscriptP1\text{P}_{1}P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT area, it is consistent with our hypothesis test analysis qualitatively, which indicates that the trained neural network is capable of recognizing charge quadrupole with less averaged events.

Refer to caption
Figure 6: P(P1)Psubscript𝑃1\text{P}(P_{1})P ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) of AMPT events with M=25𝑀25M=25italic_M = 25. The peak of this distribution near P1=0subscript𝑃10P_{1}=0italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 suggests that the charge quadrupole in most events negate one another, and the peak near P1=1subscript𝑃11P_{1}=1italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 is because the strong response to some large enough residual quadrupole.

IV Summary

In this paper, we propose a deep convolutional neural network model for CMW detection. Building upon our prior work Zhao et al. (2022) focused on deep-learning-based CME detection, this model expands its application to CMW detection. We train the neural network using data generated from the AMPT model for Au + Au collisions at 200 GeV, with CMW-like initial charge quadrupole encoded. The trained model exhibits a robust capability to discern events with CMW from those without, and it can quantitatively measure the fraction or strength of the initial charge quadrupole, effectively functioning as a CMW-meter. Furthermore, we validate the model’s performance across a broad collision energies and centralities demonstrating its resilience. We have also checked that the trained model can be well qualified even for other collision systems like Zr + Zr and Ru + Ru collisions. Comparative analysis against three-particle correlators and Δv2Δsubscript𝑣2\Delta v_{2}roman_Δ italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT illustrates the model’s effectiveness even in the presence of strong backgrounds. By employing a hypothesis test, an experimentally viable analysis based on the model can be established, wherein the distribution of model predictions serves as an indicator of CMW occurrence in the data.

One drawback of the model is that its brightness is achieved at the cost of generalization ability, as the training data is confined to a narrow range of collision energies. In the future, it would be interesting to enhance the model’s generalization capabilities and transform it into an end-to-end CMW meter.

Acknowledgement — We acknowledge the useful discussions with L.-X. Wang and K. Zhou. This work is supported by the Natural Science Foundation of Shanghai (Grant No. 23JC1400200), the National Natural Science Foundation of China (Grant No. 12225502, No. 12075061, and No. 12147101), and the National Key Research and Development Program of China (Grant No. 2022YFA1604900).

References