-
Trials Factor for Semi-Supervised NN Classifiers in Searches for Narrow Resonances at the LHC
Authors:
Benjamin Lieberman,
Andreas Crivellin,
Salah-Eddine Dahbi,
Finn Stevenson,
Nidhi Tripathi,
Mukesh Kumar,
Bruce Mellado
Abstract:
To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a tr…
▽ More
To mitigate the model dependencies of searches for new narrow resonances at the Large Hadron Collider (LHC), semi-supervised Neural Networks (NNs) can be used. Unlike fully supervised classifiers these models introduce an additional look-elsewhere effect in the process of optimising thresholds on the response distribution. We perform a frequentist study to quantify this effect, in the form of a trials factor. As an example, we consider simulated $Zγ$ data to perform narrow resonance searches using semi-supervised NN classifiers. The results from this analysis provide substantiation that the look-elsewhere effect induced by the semi-supervised NN is under control.
△ Less
Submitted 27 June, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
COVID-19 South African Vaccine Hesitancy Models Show Boost in Performance Upon Fine-Tuning on M-pox Tweets
Authors:
Nicholas Perikli,
Srimoy Bhattacharya,
Blessing Ogbuokiri,
Zahra Movahedi Nia,
Benjamin Lieberman,
Nidhi Tripathi,
Salah-Eddine Dahbi,
Finn Stevenson,
Nicola Bragazzi,
Jude Kong,
Bruce Mellado
Abstract:
Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-p…
▽ More
Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-pox dataset before and after fine-tuning. More than 20k M-pox-related tweets from South Africa were hand-labelled as being either positive, negative or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8% falling just short of 70%, but still outperforming state-of-the-art models and well-known classification algorithms. An LDA-based topic modelling procedure was used to compare the miss-classified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning
Authors:
Nicholas Perikli,
Srimoy Bhattacharya,
Blessing Ogbuokiri,
Zahra Movahedi Nia,
Benjamin Lieberman,
Nidhi Tripathi,
Salah-Eddine Dahbi,
Finn Stevenson,
Nicola Bragazzi,
Jude Kong,
Bruce Mellado
Abstract:
Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, w…
▽ More
Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Growing Excesses of New Scalars at the Electroweak Scale
Authors:
Srimoy Bhattacharya,
Guglielmo Coloretti,
Andreas Crivellin,
Salah-Eddine Dahbi,
Yaquan Fang,
Mukesh Kumar,
Bruce Mellado
Abstract:
We combine searches for scalar resonances at the electroweak scale performed by the Large Hadron Collider experiments ATLAS and CMS where persisted excesses have been observed in recent years. Using both the side-bands of Standard Model Higgs analyses as well as dedicated beyond the Standard Model analyses, we find significant hints for new scalars at $\approx 95\,$GeV ($S^\prime$) and…
▽ More
We combine searches for scalar resonances at the electroweak scale performed by the Large Hadron Collider experiments ATLAS and CMS where persisted excesses have been observed in recent years. Using both the side-bands of Standard Model Higgs analyses as well as dedicated beyond the Standard Model analyses, we find significant hints for new scalars at $\approx 95\,$GeV ($S^\prime$) and $\approx152\,$GeV ($S$). The presence of a $95\,$GeV scalar is preferred over the Standard Model hypothesis by $3.8σ$, while interpreting the $152\,$GeV excesses in a simplified model with resonant pair production of $S$ via a new heavier scalar $H(270)$, a global significance of $\approx5σ$ is obtained. While the production mechanism of the $S^\prime$ cannot yet be determined, data strongly favours the associated production of $S$, i.e. via the decay of a heavier boson $H$ ($pp\to H\to SS^*$). A possible alternative or complementary decay chain is $H\rightarrow SS^{\prime}$, where $S\to WW^*$ ($S^{\prime}$) would be the source of the leptons ($b$-quarks) necessary to explain the multi-lepton anomalies found in Large Hadron Collider data.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Consistency and Interpretation of the LHC (Di-)Di-Jet Excesses
Authors:
Andreas Crivellin,
Claudio Andrea Manzari,
Bruce Mellado,
Salah-Eddine Dahbi
Abstract:
ATLAS observed a limit for {the cross section of di-jets resonances, which is weaker than expected for a} mass slightly below $\approx$1\TeV. In addition, CMS reported hints for the (non-resonant) pair production of di-jet resonances $X$ via a particle $Y$ at a very similar mass range with a local (global) significance of 3.6\,$σ$ (2.5\,$σ$) at $m_X\approx950\,$GeV. In this article we show that us…
▽ More
ATLAS observed a limit for {the cross section of di-jets resonances, which is weaker than expected for a} mass slightly below $\approx$1\TeV. In addition, CMS reported hints for the (non-resonant) pair production of di-jet resonances $X$ via a particle $Y$ at a very similar mass range with a local (global) significance of 3.6\,$σ$ (2.5\,$σ$) at $m_X\approx950\,$GeV. In this article we show that using the preferred range for $m_X$ from the ATLAS analysis, one can reinterpret the CMS analysis of di-di-jets in terms of a resonant search with $Y\to XX$, with a significantly reduced look-elsewhere effect, finding an excess for $m_Y\!\approx\!3.6$\TeV with a significance of $4.0\,σ$ ($3.2\,σ$) locally (globally). We present two possible UV completions capable of explaining the (di-)di-jet excesses, one containing two scalar di-quarks, the other one involving heavy gluons based on an $SU(3)_1\!\times\! SU(3)_2\!\times\! SU(3)_3$ gauge symmetry, spontaneously broken to $SU(3)$ color. In the latter case, non-perturbative couplings are required, pointing towards a composite or extra-dimensional framework. In fact, using 5D-AdS space-time, one obtains the correct mass ratio for $m_X/m_Y$, assuming the $X$ is the lowest lying resonance, and predicts a third (di-)di-jet resonance with a mass around $\approx2.2$\TeV.
△ Less
Submitted 24 March, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
An investigation of over-training within semi-supervised machine learning models in the search for heavy resonances at the LHC
Authors:
Benjamin Lieberman,
Joshua Choma,
Salah-Eddine Dahbi,
Bruce Mellado,
Xifeng Ruan
Abstract:
In particle physics, semi-supervised machine learning is an attractive option to reduce model dependencies searches beyond the Standard Model. When utilizing semi-supervised techniques in training machine learning models in the search for bosons at the Large Hadron Collider, the over-training of the model must be investigated. Internal fluctuations of the phase space and bias in training can cause…
▽ More
In particle physics, semi-supervised machine learning is an attractive option to reduce model dependencies searches beyond the Standard Model. When utilizing semi-supervised techniques in training machine learning models in the search for bosons at the Large Hadron Collider, the over-training of the model must be investigated. Internal fluctuations of the phase space and bias in training can cause semi-supervised models to label false signals within the phase space due to over-fitting. The issue of false signal generation in semi-supervised models has not been fully analyzed and therefore utilizing a toy Monte Carlo model, the probability of such situations occurring must be quantified. This investigation of $Zγ$ resonances is performed using a pure background Monte Carlo sample. Through unique pure background samples extracted to mimic ATLAS data in a background-plus-signal region, multiple runs enable the probability of these fake signals occurring due to over-training to be thoroughly investigated.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Machine learning approach for the search of resonances with topological features at the Large Hadron Collider
Authors:
Salah-eddine Dahbi,
Joshua Choma,
Bruce Mellado,
Gaogalalwe Mokgatitswane,
Xifeng Ruan,
Benjamin Lieberman,
Turgay Celik
Abstract:
The observation of resonances is unequivocal evidence of new physics beyond the Standard Model at the Large Hadron Collider (LHC). So far, inclusive and model dependent searches have not provided evidence of new resonances, indicating that these could be driven by subtle topologies. Here, we use machine learning techniques based on weak supervision to perform searches. Weak supervision based on mi…
▽ More
The observation of resonances is unequivocal evidence of new physics beyond the Standard Model at the Large Hadron Collider (LHC). So far, inclusive and model dependent searches have not provided evidence of new resonances, indicating that these could be driven by subtle topologies. Here, we use machine learning techniques based on weak supervision to perform searches. Weak supervision based on mixed samples can be used to search for resonances with little or no prior knowledge on the production mechanism. Also, it offers the advantage that sidebands or control regions can be used to effectively model backgrounds with minimal reliance on simulations. However, weak supervision alone is found to be highly inefficient in identifying corners of the multi-dimensional space of interest. Instead, we propose an approach to search for new resonances that involves a classification procedure that is signature and topology based. A combination of weak supervision with Deep Neural Network algorithms are applied following this classification. The performance of this strategy is evaluated on the production of SM Higgs boson decaying to a pair of photons inclusively and in exclusive regions of phase space tailored for specific production modes at the LHC. After verifying the ability of the methodology to extract different SM Higgs boson signal mechanisms, a search for new phenomena in high-mass final states is setup for the LHC.
△ Less
Submitted 27 October, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
The anomalous production of multi-lepton and its impact on the measurement of $Wh$ production at the LHC
Authors:
Yesenia Hernandez,
Mukesh Kumar,
Alan S. Cornell,
Salah-Eddine Dahbi,
Yaquan Fang,
Benjamin Lieberman,
Bruce Mellado,
Kgomotso Monnakgotla,
Xifeng Ruan,
Shuiting Xin
Abstract:
Anomalies in multi-lepton final states at the Large Hadron Collider (LHC) have been reported in Refs.~\cite{vonBuddenbrock:2017gvy,vonBuddenbrock:2019ajh}. These can be interpreted in terms of the production of a heavy boson, $H$, decaying into a Standard Model (SM) Higgs boson, $h$, and a singlet scalar, $S$, which is treated as a SM Higgs-like boson. This process would naturally affect the measu…
▽ More
Anomalies in multi-lepton final states at the Large Hadron Collider (LHC) have been reported in Refs.~\cite{vonBuddenbrock:2017gvy,vonBuddenbrock:2019ajh}. These can be interpreted in terms of the production of a heavy boson, $H$, decaying into a Standard Model (SM) Higgs boson, $h$, and a singlet scalar, $S$, which is treated as a SM Higgs-like boson. This process would naturally affect the measurement of the $Wh$ signal strength at the LHC, where $h$ is produced in association with leptons and di-jets. Here, $h$ would be produced with lower transverse momentum, $p_{Th}$, compared to SM processes. Corners of the phase-space are fixed according to the model parameters derived in Refs.~\cite{vonBuddenbrock:2016rmr,vonBuddenbrock:2017gvy} without additional tuning, thus nullifying potential look-else-where effects or selection biases. Provided that no stringent requirements are made on $p_{Th}$ or related observables, the signal strength of $Wh$ is $μ(Wh)=2.41 \pm 0.37$. This corresponds to a deviation from the SM of $3.8σ$. This result further strengthens the need to measure with precision the SM Higgs boson couplings in $e^+e^-$, and $e^-p$ collisions, in addition to $pp$ collisions.
△ Less
Submitted 13 April, 2021; v1 submitted 2 December, 2019;
originally announced December 2019.