-
MotherNet: A Foundational Hypernetwork for Tabular Classification
Authors:
Andreas Müller,
Carlo Curino,
Raghu Ramakrishnan
Abstract:
The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture…
▽ More
The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent.
The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Classically-embedded split Cayley hexagons rule three-qubit contextuality with three-element contexts
Authors:
Metod Saniga,
Frédéric Holweck,
Colm Kelleher,
Axel Muller,
Alain Giorgetti,
Henri de Boutray
Abstract:
As it is well known, split Cayley hexagons of order two live in the three-qubit symplectic polar space in two non-isomorphic embeddings, called classical and skew. Although neither of the two embeddings yields observable-based contextual configurations of their own, {\it classically}-embedded copies are found to fully rule contextuality properties of the most prominent three-qubit contextual confi…
▽ More
As it is well known, split Cayley hexagons of order two live in the three-qubit symplectic polar space in two non-isomorphic embeddings, called classical and skew. Although neither of the two embeddings yields observable-based contextual configurations of their own, {\it classically}-embedded copies are found to fully rule contextuality properties of the most prominent three-qubit contextual configurations in the following sense: each set of unsatisfiable contexts of such a contextual configuration is isomorphic to the set of lines that certain classically-embedded hexagon shares with this particular configuration. In particular, for a doily this shared set comprises three pairwise disjoint lines belonging to a grid of the doily, for an elliptic quadric the corresponding set features nine mutually disjoint lines forming a (Desarguesian) spread on the quadric, for a hyperbolic quadric the set entails 21 lines that are in bijection with the edges of the Heawood graph and, finally, for the configuration that consists of all the 315 contexts of the space its 63 unsatisfiable ones cover an entire hexagon. A particular illustration of this encoding is provided by the {\it line-complement} of a skew-embedded hexagon; its 24 unsatisfiable contexts correspond exactly to those 24 lines in which a particular classical copy of the hexagon differs from the considered skew-embedded one. In connection with the last-mentioned case we also conducted some experimental tests on a Noisy Intermediate Scale Quantum (NISQ) computer to validate our theoretical findings.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems
Authors:
Mohammad Alsalti,
Victor G. Lopez,
Matthias A. Müller
Abstract:
In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed…
▽ More
In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Sea Level Rise by Climate Change. An order-of-magnitude approach to an environmental problem
Authors:
Cedric Loretan,
Andreas Müller
Abstract:
A figure, taken from a science literacy test, illustrates the distribution of water across various locations on Earth, represented as though the entire volume is contained in 100 buckets. Using this figure , and other basic, readily available geographic information one can deduce the approximate value of about 70m for the rise in sea level due to the melting of all Earth's land ice, a value often…
▽ More
A figure, taken from a science literacy test, illustrates the distribution of water across various locations on Earth, represented as though the entire volume is contained in 100 buckets. Using this figure , and other basic, readily available geographic information one can deduce the approximate value of about 70m for the rise in sea level due to the melting of all Earth's land ice, a value often discussed in media and public discourse.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Notes on data-driven output-feedback control of linear MIMO systems
Authors:
Mohammad Alsalti,
Victor G. Lopez,
Matthias A. Müller
Abstract:
Recent works have approached the data-driven design of output-feedback controllers for discrete-time LTI systems by constructing non-minimal state vectors composed of past inputs and outputs. Depending on the system's complexity (order, lag and number of inputs), it was observed in several works that such an approach presents certain limitations, but no methods were proposed to overcome them. In t…
▽ More
Recent works have approached the data-driven design of output-feedback controllers for discrete-time LTI systems by constructing non-minimal state vectors composed of past inputs and outputs. Depending on the system's complexity (order, lag and number of inputs), it was observed in several works that such an approach presents certain limitations, but no methods were proposed to overcome them. In this note, we clarify these limitations and solve them by proposing the construction of (alternative) non-minimal state vectors that facilitate output-feedback control of MIMO discrete-time LTI systems.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Effect of the depolarizing field on the domain structure of an improper ferroelectric
Authors:
Aaron Merlin Müller,
Amadé Bortis,
Manfred Fiebig,
Thomas Lottermoser
Abstract:
We show that, contrary to common belief, the depolarizing electric field generated by bound charges at thin-film surfaces can have a substantial impact on the domain structure of an improper ferroelectric with topological defects. In hexagonal-manganite thin films, we observe in phase-field simulations that through the action of the depolarizing field, (i) the average magnitude of the polarization…
▽ More
We show that, contrary to common belief, the depolarizing electric field generated by bound charges at thin-film surfaces can have a substantial impact on the domain structure of an improper ferroelectric with topological defects. In hexagonal-manganite thin films, we observe in phase-field simulations that through the action of the depolarizing field, (i) the average magnitude of the polarization density decreases, (ii) the local magnitude of the polarization density decreases with increasing distance from the domain walls, and (iii) there is a significant alteration of the domain-size distribution, which is visualized with the pair-correlation function. We conclude that, in general, it is not appropriate to ignore the effects of the depolarizing field for thin film ferroelectrics.
△ Less
Submitted 9 May, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Quantum Cascade Lasers as Broadband Sources via Strong RF Modulation
Authors:
Alessio Cargioli,
Diego Piciocchi,
Mathieu Bertrand,
Richard Maulini,
Tobias Gresch,
Antonine Muller,
Giacomo Scalari,
Jerome Faist
Abstract:
In this work, we demonstrate that in a regime of strong modulation, by generating pulses of the length of the order of a few cavity lifetimes (hundreds of ps), a broadband quantum cascade laser can be driven to lase on a bandwidth (250cm-1) limited by the gain. In addition, the amplitude noise of the radiation was shown to be limited by the detector. A laser linewidth study has been performed unde…
▽ More
In this work, we demonstrate that in a regime of strong modulation, by generating pulses of the length of the order of a few cavity lifetimes (hundreds of ps), a broadband quantum cascade laser can be driven to lase on a bandwidth (250cm-1) limited by the gain. In addition, the amplitude noise of the radiation was shown to be limited by the detector. A laser linewidth study has been performed under different operating conditions finding values spanning from 20MHz to 800MHz, indicating a trade-off between emission bandwidth, amplitude stability and coherence.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features
Authors:
Vladimir Belov,
Tracy Erwin-Grabner,
Ling-Li Zeng,
Christopher R. K. Ching,
Andre Aleman,
Alyssa R. Amod,
Zeynep Basgoze,
Francesco Benedetti,
Bianca Besteher,
Katharina Brosch,
Robin Bülow,
Romain Colle,
Colm G. Connolly,
Emmanuelle Corruble,
Baptiste Couvy-Duchesne,
Kathryn Cullen,
Udo Dannlowski,
Christopher G. Davey,
Annemiek Dols,
Jan Ernsting,
Jennifer W. Evans,
Lukas Fisch,
Paola Fuentes-Claramonte,
Ali Saffet Gonul,
Ian H. Gotlib
, et al. (63 additional authors not shown)
Abstract:
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h…
▽ More
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing an extensive sample of people with MDD (N=2,772) and HC (N=4,240), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
A Fast Radio Burst in a Compact Galaxy Group at $z$~1
Authors:
Alexa C. Gordon,
Wen-fai Fong,
Sunil Simha,
Yuxin Dong,
Charles D. Kilpatrick,
Adam T. Deller,
Stuart D. Ryder,
Tarraneh Eftekhari,
Marcin Glowacki,
Lachlan Marnoch,
August R. Muller,
Anya E. Nugent,
Antonella Palmese,
J. Xavier Prochaska,
Marc Rafelski,
Ryan M. Shannon,
Nicolas Tejos
Abstract:
FRB 20220610A is a high-redshift Fast Radio Burst (FRB) that has not been observed to repeat. Here, we present rest-frame UV and optical $\textit{Hubble Space Telescope}$ observations of the field of FRB 20220610A. The imaging reveals seven extended sources, one of which we identify as the most likely host galaxy with a spectroscopic redshift of $z$=1.017. We spectroscopically confirm at least thr…
▽ More
FRB 20220610A is a high-redshift Fast Radio Burst (FRB) that has not been observed to repeat. Here, we present rest-frame UV and optical $\textit{Hubble Space Telescope}$ observations of the field of FRB 20220610A. The imaging reveals seven extended sources, one of which we identify as the most likely host galaxy with a spectroscopic redshift of $z$=1.017. We spectroscopically confirm at least three additional sources to be at the same redshift, and identify the system as a compact galaxy group with possible signs of interaction among group members. We determine the host of FRB 20220610A to be a star-forming galaxy with stellar mass of $\approx10^{9.7}\,M_{\odot}$, mass-weighted age of $\approx2.6$~Gyr, and star formation rate (integrated over the last 100 Myr) of $\approx1.7$~M$_{\odot}$~yr$^{-1}$. These host properties are commensurate with the star-forming field galaxy population at z~1 and trace their properties analogously to the population of low-$z$ FRB hosts. Based on estimates of the total stellar mass of the galaxy group, we calculate a fiducial contribution to the observed Dispersion Measure (DM) from the intragroup medium of $\approx 110-220$ $\rm pc \, cm^{-3}$ (rest-frame). This leaves a significant excess of $500^{+272}_{-109}$ $\rm pc \, cm^{-3}$ (in the observer frame), with additional sources of DM possibly originating from the circumburst environment, host galaxy interstellar medium, and/or foreground structures along the line of sight. Given the low occurrence rates of galaxies in compact groups, the discovery of an FRB in such a group demonstrates a rare and novel environment in which FRBs can occur.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Authors:
Aaron Mueller,
Albert Webson,
Jackson Petty,
Tal Linzen
Abstract:
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the underlying structure of the task defined by the context, or do they rely on superficial heuristics that only generalize to identically distributed examples? We…
▽ More
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the underlying structure of the task defined by the context, or do they rely on superficial heuristics that only generalize to identically distributed examples? We address this question using transformations tasks and an NLI task that assess sensitivity to syntax - a requirement for robust language understanding. We further investigate whether out-of-distribution generalization can be improved via chain-of-thought prompting, where the model is provided with a sequence of intermediate computation steps that illustrate how the task ought to be performed. In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs. The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size; in particular, models pre-trained on code generalize better, and benefit more from chain-of-thought prompting.
△ Less
Submitted 10 April, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Atmospheric neutrino oscillation analysis with neutron tagging and an expanded fiducial volume in Super-Kamiokande I-V
Authors:
Super-Kamiokande Collaboration,
:,
T. Wester,
K. Abe,
C. Bronner,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
Y. Nakano,
M. Nakahata,
S. Nakayama,
Y. Noguchi,
K. Sato,
H. Sekiya
, et al. (212 additional authors not shown)
Abstract:
We present a measurement of neutrino oscillation parameters with the Super-Kamiokande detector using atmospheric neutrinos from the complete pure-water SK I-V (April 1996-July 2020) data set, including events from an expanded fiducial volume. The data set corresponds to 6511.3 live days and an exposure of 484.2 kiloton-years. Measurements of the neutrino oscillation parameters $Δm^2_{32}$,…
▽ More
We present a measurement of neutrino oscillation parameters with the Super-Kamiokande detector using atmospheric neutrinos from the complete pure-water SK I-V (April 1996-July 2020) data set, including events from an expanded fiducial volume. The data set corresponds to 6511.3 live days and an exposure of 484.2 kiloton-years. Measurements of the neutrino oscillation parameters $Δm^2_{32}$, $\sin^2θ_{23}$, $\sin^2 θ_{13}$, $δ_{CP}$, and the preference for the neutrino mass ordering are presented with atmospheric neutrino data alone, and with constraints on $\sin^2 θ_{13}$ from reactor neutrino experiments. Our analysis including constraints on $\sin^2 θ_{13}$ favors the normal mass ordering at the 92.3% level.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
The Surprising Lack of Effect from Stellar Feedback on the Gas Strip** Rate from Massive Jellyfish Galaxies
Authors:
Nina Akerman,
Stephanie Tonnesen,
Bianca M Poggianti,
Rory Smith,
Antonino Marasco,
Andrea Kulier,
Ancla Müller,
Benedetta Vulcani
Abstract:
We study the role of star formation and stellar feedback in a galaxy being ram pressure stripped on its infall into a cluster. We use hydrodynamical wind-tunnel simulations of a massive galaxy ($M_\text{star} = 10^{11} M_\odot$) moving into a massive cluster ($M_\text{cluster} = 10^{15} M_\odot$). We have two types of simulations: with and without star formation and stellar feedback, SF and RC res…
▽ More
We study the role of star formation and stellar feedback in a galaxy being ram pressure stripped on its infall into a cluster. We use hydrodynamical wind-tunnel simulations of a massive galaxy ($M_\text{star} = 10^{11} M_\odot$) moving into a massive cluster ($M_\text{cluster} = 10^{15} M_\odot$). We have two types of simulations: with and without star formation and stellar feedback, SF and RC respectively. For each type we simulate four realisations of the same galaxy: a face-on wind, edge-on wind, $45^\circ$ angled wind, and a control galaxy not subject to ram pressure. We directly compare the strip** evolution of galaxies with and without star formation. We find that stellar feedback has no direct effect on the strip** process, i.e. there is no enhancement in strip** via a velocity kick to the interstellar medium gas. The main difference between RC and SF galaxies is due to the indirect effect of stellar feedback, which produces a smoother and more homogeneous interstellar medium. Hence, while the average gas surface density is comparable in both simulation types, the scatter is broader in the RC galaxies. As a result, at the galaxy outskirts overdense clumps survive in RC simulation, and the strip** proceeds more slowly. At the same time, in the inner disc, underdense gas in the RC holes is removed faster than the smoothly distributed gas in the SF simulation. For our massive galaxy, we therefore find that the effect of feedback on the strip** rate is almost negligible, independent of wind angle.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Measurement of the neutrino-oxygen neutral-current quasielastic cross section using atmospheric neutrinos in the SK-Gd experiment
Authors:
S. Sakai,
K. Abe,
C. Bronner,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
Y. Nakano,
M. Nakahata,
S. Nakayama,
Y. Noguchi,
K. Sato,
H. Sekiya,
H. Shiba,
K. Shimizu
, et al. (211 additional authors not shown)
Abstract:
We report the first measurement of the atmospheric neutrino-oxygen neutral-current quasielastic (NCQE) cross section in the gadolinium-loaded Super-Kamiokande (SK) water Cherenkov detector. In June 2020, SK began a new experimental phase, named SK-Gd, by loading 0.011% by mass of gadolinium into the ultrapure water of the SK detector. The introduction of gadolinium to ultrapure water has the effec…
▽ More
We report the first measurement of the atmospheric neutrino-oxygen neutral-current quasielastic (NCQE) cross section in the gadolinium-loaded Super-Kamiokande (SK) water Cherenkov detector. In June 2020, SK began a new experimental phase, named SK-Gd, by loading 0.011% by mass of gadolinium into the ultrapure water of the SK detector. The introduction of gadolinium to ultrapure water has the effect of improving the neutron-tagging efficiency. Using a 552.2 day data set from August 2020 to June 2022, we measure the NCQE cross section to be 0.74 $\pm$ 0.22(stat.) $^{+0.85}_{-0.15}$ (syst.) $\times$ 10$^{-38}$ cm$^{2}$/oxygen in the energy range from 160 MeV to 10 GeV, which is consistent with the atmospheric neutrino-flux-averaged theoretical NCQE cross section and the measurement in the SK pure-water phase within the uncertainties. Furthermore, we compare the models of the nucleon-nucleus interactions in water and find that the Binary Cascade model and the Liege Intranuclear Cascade model provide a somewhat better fit to the observed data than the Bertini Cascade model. Since the atmospheric neutrino-oxygen NCQE reactions are one of the main backgrounds in the search for diffuse supernova neutrino background (DSNB), these new results will contribute to future studies - and the potential discovery - of the DSNB in SK.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
RFSoC Gen3-Based Software-Defined Radio Characterization for the Readout System of Low-Temperature Bolometers
Authors:
M. E. García Redondo,
T. Muscheid,
R. Gartmann,
J. M. Salum,
L. P. Ferreyro,
N. A. Müller,
J. D. Bonilla-Neira,
J. M. Geria,
J. J. Bonaparte,
A. Almela,
L. E. Ardila-Perez,
M. R. Hampel,
A. E. Fuster,
M. Platino,
O. Sander,
M. Weber,
A. Etchegoyen
Abstract:
This work reports the performance evaluation of an SDR readout system based on the latest generation (Gen3) of the AMD's Radio Frequency System-on-Chip (RFSoC) processing platform, which integrates a full-stack processing system and a powerful FPGA with up to 32 high-speed and high-resolution 14-bit Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs). The proposed readout s…
▽ More
This work reports the performance evaluation of an SDR readout system based on the latest generation (Gen3) of the AMD's Radio Frequency System-on-Chip (RFSoC) processing platform, which integrates a full-stack processing system and a powerful FPGA with up to 32 high-speed and high-resolution 14-bit Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs). The proposed readout system uses a previously developed multi-band, double-conversion IQ RF-mixing board targeting a multiplexing factor of approximately 1,000 bolometers in a bandwidth between 4 and 8 GHz, in line with state-of-the-art microwave SQUID multiplexers ($μ$MUX). The characterization of the system was performed in two stages, under the conditions typically imposed by the multiplexer and the cold readout circuit. First, in transmission, showing that noise and spurious levels of the generated tones are close to the values imposed by the cold readout. Second, in RF loopback, presenting noise values better than -100 dBc/Hz totally in agreement with the state-of-the-art readout systems. It was demonstrated that the RFSoC Gen3 device is a suitable enabling technology for the next generation of superconducting detector readout systems, reducing system complexity, increasing system integration, and achieving these goals without performance degradation.
△ Less
Submitted 8 May, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Search for Periodic Time Variations of the Solar $^8$B Neutrino Flux between 1996 and 2018 in Super-Kamiokande
Authors:
K. Abe,
C. Bronner,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
Y. Nakano,
M. Nakahata,
S. Nakayama,
Y. Noguchi,
K. Sato,
H. Sekiya,
H. Shiba,
K. Shimizu,
M. Shiozawa
, et al. (211 additional authors not shown)
Abstract:
We report a search for time variations of the solar $^8$B neutrino flux using 5804 live days of Super-Kamiokande data collected between May 31, 1996, and May 30, 2018. Super-Kamiokande measured the precise time of each solar neutrino interaction over 22 calendar years to search for solar neutrino flux modulations with unprecedented precision. Periodic modulations are searched for in a dataset comp…
▽ More
We report a search for time variations of the solar $^8$B neutrino flux using 5804 live days of Super-Kamiokande data collected between May 31, 1996, and May 30, 2018. Super-Kamiokande measured the precise time of each solar neutrino interaction over 22 calendar years to search for solar neutrino flux modulations with unprecedented precision. Periodic modulations are searched for in a dataset comprising five-day interval solar neutrino flux measurements with a maximum likelihood method. We also applied the Lomb-Scargle method to this dataset to compare it with previous reports. The only significant modulation found is due to the elliptic orbit of the Earth around the Sun. The observed modulation is consistent with astronomical data: we measured an eccentricity of (1.53$\pm$0.35)\%, and a perihelion shift of ($-$1.5$\pm$13.5) days.
△ Less
Submitted 6 June, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code
Authors:
Mohammed Latif Siddiq,
Joanna C. S. Santos,
Sajith Devareddy,
Anna Muller
Abstract:
With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to…
▽ More
With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Therefore, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, configurable assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
△ Less
Submitted 3 June, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Silicon Implantation and Annealing in $β$-Ga$_2$O$_3$: Role of Ambient, Temperature, and Time
Authors:
K. R. Gann,
N. Pieczulewski1,
C. A. Gorsak,
K. Heinselman,
T. J. Asel,
B. A. Noesges,
K. T. Smith,
D. M. Dryden,
H. G. Xing,
H. P. Nair,
D. A. Muller,
M. O. Thompson
Abstract:
Optimizing thermal anneals of Si-implanted $β$-Ga$_2$O$_3$ is critical for low resistance contacts and selective area do**. We report the impact of annealing ambient, temperature, and time on activation of room temperature ion-implanted Si in $β$-Ga$_2$O$_3$ at concentrations from 5x10$^{18}$ to 1x10$^{20}$ cm$^{-3}$, demonstrating full activation (>80% activation, mobilities >70 cm$^{2}$/Vs) wi…
▽ More
Optimizing thermal anneals of Si-implanted $β$-Ga$_2$O$_3$ is critical for low resistance contacts and selective area do**. We report the impact of annealing ambient, temperature, and time on activation of room temperature ion-implanted Si in $β$-Ga$_2$O$_3$ at concentrations from 5x10$^{18}$ to 1x10$^{20}$ cm$^{-3}$, demonstrating full activation (>80% activation, mobilities >70 cm$^{2}$/Vs) with contact resistances below 0.29 $Ω$-mm. Homoepitaxial $β$-Ga$_2$O$_3$ films, grown by plasma assisted MBE on Fe-doped (010) substrates, were implanted at multiple energies to yield 100 nm box profiles of 5x10$^{18}$, 5x10$^{19}$, and 1x10$^{20}$ cm$^{-3}$. Anneals were performed in a UHV-compatible quartz furnace at 1 bar with well-controlled gas composition. To maintain $β$-Ga$_2$O$_3$ stability, $p_{O2}$ must be greater than 10$^{-9}$ bar. Anneals up to $p_{O2}$ = 1 bar achieve full activation at 5x10$^{18}$ cm$^{-3}$, while 5x10$^{19}$ cm$^{-3}$ must be annealed with $p_{O2}$ <10$^{-4}$ bar and 1x10$^{20}$ cm$^{-3}$ requires $p_{O2}$ <10$^{-6}$ bar. Water vapor prevents activation and must be maintained below 10$^{-8}$ bar. Activation is achieved for anneal temperatures as low as 850 °C with mobility increasing with anneal temperature up to 1050 °C, though Si diffusion has been reported above 950 °C. At 950 °C, activation is maximized between 5 and 20 minutes with longer times resulting in decreased carrier activation (over-annealing). This over-annealing is significant for concentrations above 5x10$^{19}$ cm$^{-3}$ and occurs rapidly at 1x10$^{20}$ cm$^{-3}$. RBS (channeling) suggests damage recovery is seeded from remnant aligned $β$-Ga$_2$O$_3$ that remains after implantation; this conclusion is also supported by STEM showing retention of the $β$-phase with inclusions that resemble the $γ$-phase.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Experimental high-dimensional entanglement certification and quantum steering with time-energy measurements
Authors:
Kai-Chi Chang,
Murat Can Sarihan,
Xiang Cheng,
Paul Erker,
Andrew Mueller,
Maria Spiropulu,
Matthew D. Shaw,
Boris Korzh,
Marcus Huber,
Chee Wei Wong
Abstract:
High-dimensional entanglement provides unique ways of transcending the limitations of current approaches in quantum information processing, quantum communications based on qubits. The generation of time-frequency qudit states offer significantly increased quantum capacities while kee** the number of photons constant, but pose significant challenges regarding the possible measurements for certifi…
▽ More
High-dimensional entanglement provides unique ways of transcending the limitations of current approaches in quantum information processing, quantum communications based on qubits. The generation of time-frequency qudit states offer significantly increased quantum capacities while kee** the number of photons constant, but pose significant challenges regarding the possible measurements for certification of entanglement. Here, we develop a new scheme and experimentally demonstrate the certification of 24-dimensional entanglement and a 9-dimensional quantum steering. We then subject our photon-pairs to dispersion conditions equivalent to the transmission through 600-km of fiber and still certify 21-dimensional entanglement. Furthermore, we use a steering inequality to prove 7-dimensional entanglement in a semi-device independent manner, proving that large chromatic dispersion is not an obstacle in distributing and certifying high-dimensional entanglement and quantum steering. Our highly scalable scheme is based on commercial telecommunication optical fiber components and recently developed low-jitter high-efficiency single-photon detectors, thus opening new pathways towards advanced large-scale quantum information processing and high-performance, noise-tolerant quantum communications with time-energy measurements
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Function Vectors in Large Language Models
Authors:
Eric Todd,
Millicent L. Li,
Arnab Sen Sharma,
Aaron Mueller,
Byron C. Wallace,
David Bau
Abstract:
We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are…
▽ More
We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs. Our code and data are available at https://functions.baulab.info.
△ Less
Submitted 25 February, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
On the Detection of Image-Scaling Attacks in Machine Learning
Authors:
Erwin Quiring,
Andreas Müller,
Konrad Rieck
Abstract:
Image scaling is an integral part of machine learning and computer vision systems. Unfortunately, this preprocessing step is vulnerable to so-called image-scaling attacks where an attacker makes unnoticeable changes to an image so that it becomes a new image after scaling. This opens up new ways for attackers to control the prediction or to improve poisoning and backdoor attacks. While effective t…
▽ More
Image scaling is an integral part of machine learning and computer vision systems. Unfortunately, this preprocessing step is vulnerable to so-called image-scaling attacks where an attacker makes unnoticeable changes to an image so that it becomes a new image after scaling. This opens up new ways for attackers to control the prediction or to improve poisoning and backdoor attacks. While effective techniques exist to prevent scaling attacks, their detection has not been rigorously studied yet. Consequently, it is currently not possible to reliably spot these attacks in practice.
This paper presents the first in-depth systematization and analysis of detection methods for image-scaling attacks. We identify two general detection paradigms and derive novel methods from them that are simple in design yet significantly outperform previous work. We demonstrate the efficacy of these methods in a comprehensive evaluation with all major learning platforms and scaling algorithms. First, we show that image-scaling attacks modifying the entire scaled image can be reliably detected even under an adaptive adversary. Second, we find that our methods provide strong detection performance even if only minor parts of the image are manipulated. As a result, we can introduce a novel protection layer against image-scaling attacks.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
High-rate multiplexed entanglement source based on time-bin qubits for advanced quantum networks
Authors:
Andrew Mueller,
Samantha Davis,
Boris Korzh,
Raju Valivarthi,
Andrew D. Beyer,
Rahaf Youssef,
Neil Sinclair,
Cristián Peña,
Matthew D. Shaw,
Maria Spiropulu
Abstract:
Entanglement distribution based on time-bin qubits is an attractive option for emerging quantum networks. We demonstrate a 4.09 GHz repetition rate source of photon pairs entangled across early and late time bins separated by 80 ps. Simultaneous high rates and high visibilities are achieved through frequency multiplexing the spontaneous parametric down conversion output into 8 time-bin entangled p…
▽ More
Entanglement distribution based on time-bin qubits is an attractive option for emerging quantum networks. We demonstrate a 4.09 GHz repetition rate source of photon pairs entangled across early and late time bins separated by 80 ps. Simultaneous high rates and high visibilities are achieved through frequency multiplexing the spontaneous parametric down conversion output into 8 time-bin entangled pairs. We demonstrate entanglement visibilities as high as 99.4%, total entanglement rates up to 3.55e6 coincidences/s, and predict a straightforward path towards achieving up to an order of magnitude improvement in rates without compromising visibility. Finally, we resolve the density matrices of the entangled states for each multiplexed channel and express distillable entanglement rates in ebit/s, thereby quantifying the tradeoff between visibility and coincidence rates that contributes to useful entanglement distribution. This source is a fundamental building block for high-rate entanglement-based quantum key distribution systems or advanced quantum networks.
△ Less
Submitted 12 February, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
The effect of cluster dynamical state on ram-pressure strip**
Authors:
A. C. C. Lourenço,
Y. L. Jaffé,
B. Vulcani,
A. Biviano,
B. Poggianti,
A. Moretti,
K. Kelkar,
J. P. Crossett,
M. Gitti,
R. Smith,
T. F. Laganá,
M. Gullieuszik,
A. Ignesti,
S. McGee,
A. Wolter,
S. Sonkamble,
A. Müller
Abstract:
Theoretical and observational studies have suggested that ram-pressure strip** by the intracluster medium can be enhanced during cluster interactions, boosting the formation of the "jellyfish" galaxies. In this work, we study the incidence of galaxies undergoing ram-pressure strip** in 52 clusters of different dynamical states. We use optical data from the WINGS/OmegaWINGS surveys and archival…
▽ More
Theoretical and observational studies have suggested that ram-pressure strip** by the intracluster medium can be enhanced during cluster interactions, boosting the formation of the "jellyfish" galaxies. In this work, we study the incidence of galaxies undergoing ram-pressure strip** in 52 clusters of different dynamical states. We use optical data from the WINGS/OmegaWINGS surveys and archival X-ray data to characterise the dynamical state of our cluster sample, applying eight different proxies. We then compute the number of ram-pressure strip** candidates relative to the infalling population of blue late-type galaxies within a fixed circular aperture in each cluster. We find no clear correlation between the fractions of ram-pressure strip** candidates and the different cluster dynamical state proxies considered. These fractions also show no apparent correlation with cluster mass. To construct a dynamical state classification closer to a merging "sequence", we perform a visual classification of the dynamical states of the clusters, combining information available in optical, X-ray, and radio wavelengths. We find a mild increase in the RPS fraction in interacting clusters with respect to all other classes (including post-mergers). This mild enhancement could hint at a short-lived enhanced ram-pressure strip** in ongoing cluster mergers. However, our results are not statistically significant due to the low galaxy numbers. We note this is the first homogeneous attempt to quantify the effect of cluster dynamical state on ram-pressure strip** using a large cluster sample, but even larger (especially wider) multi-wavelength surveys are needed to confirm the results.
△ Less
Submitted 29 September, 2023; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Sample- and computationally efficient data-driven predictive control
Authors:
Mohammad Alsalti,
Manuel Barkey,
Victor G. Lopez,
Matthias A. Müller
Abstract:
Recently proposed data-driven predictive control schemes for LTI systems use non-parametric representations based on the image of a Hankel matrix of previously collected, persistently exciting, input-output data. Persistence of excitation necessitates that the data is sufficiently long and, hence, the computational complexity of the corresponding finite-horizon optimal control problem increases. I…
▽ More
Recently proposed data-driven predictive control schemes for LTI systems use non-parametric representations based on the image of a Hankel matrix of previously collected, persistently exciting, input-output data. Persistence of excitation necessitates that the data is sufficiently long and, hence, the computational complexity of the corresponding finite-horizon optimal control problem increases. In this paper, we propose an efficient data-driven predictive control (eDDPC) scheme which is both more sample efficient (requires less offline data) and computationally efficient (uses less decision variables) compared to existing schemes. This is done by leveraging an alternative data-based representation of the trajectories of LTI systems. We analytically and numerically compare the performance of this scheme to existing ones from the literature.
△ Less
Submitted 6 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
A Simplified Expression for Quantum Fidelity
Authors:
Adrian Müller
Abstract:
Quantum fidelity is one of the most important measures of similarity between mixed quantum states. However, the usual formulation is cumbersome and hard to understand when encountering the first time. This work shows in a novel, elegant proof that the expression can be rewritten into a form, which is not only more concise but also makes its symmetry property more obvious. Further, the simpler expr…
▽ More
Quantum fidelity is one of the most important measures of similarity between mixed quantum states. However, the usual formulation is cumbersome and hard to understand when encountering the first time. This work shows in a novel, elegant proof that the expression can be rewritten into a form, which is not only more concise but also makes its symmetry property more obvious. Further, the simpler expression gives rise to a formulation that is subsequently shown to be more computationally efficient than the best previous methods by avoiding any full decomposition. Future work might look for ways in which other theorems could be affected or utilize the reformulation where fidelity is the computational bottleneck.
△ Less
Submitted 10 October, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Authors:
Angel Abusleme,
Thomas Adam,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Muhammad Akram,
Abid Aleem,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato,
Marco Beretta,
Antonio Bergnoli
, et al. (606 additional authors not shown)
Abstract:
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neu…
▽ More
The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), a 20 kton liquid scintillator detector currently under construction in South China. The real-time monitoring system is designed to ensure both prompt alert speed and comprehensive coverage of progenitor stars. It incorporates prompt monitors on the electronic board as well as online monitors at the data acquisition stage. Assuming a false alert rate of 1 per year, this monitoring system exhibits sensitivity to pre-SN neutrinos up to a distance of approximately 1.6 (0.9) kiloparsecs and SN neutrinos up to about 370 (360) kiloparsecs for a progenitor mass of 30 solar masses, considering both normal and inverted mass ordering scenarios. The pointing ability of the CCSN is evaluated by analyzing the accumulated event anisotropy of inverse beta decay interactions from pre-SN or SN neutrinos. This, along with the early alert, can play a crucial role in facilitating follow-up multi-messenger observations of the next galactic or nearby extragalactic CCSN.
△ Less
Submitted 4 December, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
High-dimensional time-frequency entanglement in a singly-filtered biphoton frequency comb
Authors:
Xiang Cheng,
Kai-Chi Chang,
Murat Can Sarihan,
Andrew Mueller,
Maria Spiropulu,
Matthew D. Shaw,
Boris Korzh,
Andrei Faraon,
Franco N. C. Wong,
Jeffrey H. Shapiro,
Chee Wei Wong
Abstract:
High-dimensional quantum entanglement is a cornerstone for advanced technology enabling large-scale noise-tolerant quantum systems, fault-tolerant quantum computing, and distributed quantum networks. The recently developed biphoton frequency comb (BFC) provides a powerful platform for high-dimensional quantum information processing in its spectral and temporal quantum modes. Here we propose and ge…
▽ More
High-dimensional quantum entanglement is a cornerstone for advanced technology enabling large-scale noise-tolerant quantum systems, fault-tolerant quantum computing, and distributed quantum networks. The recently developed biphoton frequency comb (BFC) provides a powerful platform for high-dimensional quantum information processing in its spectral and temporal quantum modes. Here we propose and generate a singly-filtered high-dimensional BFC via spontaneous parametric down-conversion by spectrally sha** only the signal photons with a Fabry-Perot cavity. High-dimensional energy-time entanglement is verified through Franson-interference recurrences and temporal correlation with low-jitter detectors. Frequency- and temporal- entanglement of our singly-filtered BFC is then quantified by Schmidt mode decomposition. Subsequently, we distribute the high-dimensional singly-filtered BFC state over a 10 km fiber link with a post-distribution time-bin dimension lower bounded to be at least 168. Our demonstrations of high-dimensional entanglement and entanglement distribution show the capability of the singly-filtered quantum frequency comb for high-efficiency quantum information processing and high-capacity quantum networks.
△ Less
Submitted 11 September, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
An Overview of Formulae for the Higher-Order Kinematics of Lower-Pair Chains with Applications in Robotics and Mechanism Theory
Authors:
Andreas Mueller
Abstract:
The motions of mechanisms can be described in terms of screw coordinates by means of an exponential map**. The product of exponentials (POE) describes the configuration of a chain of bodies connected by lower pair joints. The kinematics is thus given in terms of joint screws. The POE serves to express loop constraints for mechanisms as well as the forward kinematics of serial manipulators. Besid…
▽ More
The motions of mechanisms can be described in terms of screw coordinates by means of an exponential map**. The product of exponentials (POE) describes the configuration of a chain of bodies connected by lower pair joints. The kinematics is thus given in terms of joint screws. The POE serves to express loop constraints for mechanisms as well as the forward kinematics of serial manipulators. Besides the compact formulations, the POE gives rise to purely algebraic relations for derivatives wrt. joint variables. It is known that the partial derivatives of the instantaneous joint screws (columns of the geometric Jacobian) are determined by Lie brackets the joint screws. Lesser-known is that derivative of arbitrary order can be compactly expressed by Lie brackets. This has significance for higher-order forward/inverse kinematics and dynamics of robots and multibody systems. Various relations were reported but are scattered in the literature and insufficiently recognized. This paper aims to provide a comprehensive overview of the relevant relations. Its original contributions are closed form and recursive relations for higher-order derivatives and Taylor expansions of various kinematic relations. Their application to kinematic control and dynamics of robotic manipulators and multibody systems is discussed.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Measurements of the $ν_μ$ and $\barν_μ$-induced Coherent Charged Pion Production Cross Sections on $^{12}C$ by the T2K experiment
Authors:
K. Abe,
N. Akhlaq,
R. Akutsu,
A. Ali,
S. Alonso Monsalve,
C. Alt,
C. Andreopoulos,
M. Antonova,
S. Aoki,
T. Arihara,
Y. Asada,
Y. Ashida,
E. T. Atkin,
M. Barbi,
G. J. Barker,
G. Barr,
D. Barrow,
M. Batkiewicz-Kwasniak,
V. Berardi,
L. Berns,
S. Bhadra,
A. Blanchet,
A. Blondel,
S. Bolognesi,
T. Bonus
, et al. (359 additional authors not shown)
Abstract:
We report an updated measurement of the $ν_μ$-induced, and the first measurement of the $\barν_μ$-induced coherent charged pion production cross section on $^{12}C$ nuclei in the T2K experiment. This is measured in a restricted region of the final-state phase space for which $p_{μ,π} > 0.2$ GeV, $\cos(θ_μ) > 0.8$ and $\cos(θ_π) > 0.6$, and at a mean (anti)neutrino energy of 0.85 GeV using the T2K…
▽ More
We report an updated measurement of the $ν_μ$-induced, and the first measurement of the $\barν_μ$-induced coherent charged pion production cross section on $^{12}C$ nuclei in the T2K experiment. This is measured in a restricted region of the final-state phase space for which $p_{μ,π} > 0.2$ GeV, $\cos(θ_μ) > 0.8$ and $\cos(θ_π) > 0.6$, and at a mean (anti)neutrino energy of 0.85 GeV using the T2K near detector. The measured $ν_μ$ CC coherent pion production flux-averaged cross section on $^{12}C$ is $(2.98 \pm 0.37 (stat.) \pm 0.31 (syst.) \substack{ +0.49 \\ -0.00 } \mathrm{ (Q^2\,model)}) \times 10^{-40}~\mathrm{cm}^{2}$. The new measurement of the $\barν_μ$-induced cross section on $^{12}{C}$ is $(3.05 \pm 0.71 (stat.) \pm 0.39 (syst.) \substack{ +0.74 \\ -0.00 } \mathrm{(Q^2\,model)}) \times 10^{-40}~\mathrm{cm}^{2}$. The results are compatible with both the NEUT 5.4.0 Berger-Sehgal (2009) and GENIE 2.8.0 Rein-Sehgal (2007) model predictions.
△ Less
Submitted 14 October, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Supercell formation in epitaxial rare-earth ditelluride thin films
Authors:
Adrian Llanos,
Salva Salmani-Rezaie,
**woong Kim,
Nicholas Kioussis,
David A. Muller,
Joseph Falson
Abstract:
Square net tellurides host an array of electronic ground states and commonly exhibit charge-density-wave ordering. Here we report the epitaxy of DyTe$_{2-δ}$ on atomically flat MgO (001) using molecular beam epitaxy. The films are single phase and highly oriented as evidenced by transmission electron microscopy and X-ray diffraction measurements. Epitaxial strain is evident in films and is relieve…
▽ More
Square net tellurides host an array of electronic ground states and commonly exhibit charge-density-wave ordering. Here we report the epitaxy of DyTe$_{2-δ}$ on atomically flat MgO (001) using molecular beam epitaxy. The films are single phase and highly oriented as evidenced by transmission electron microscopy and X-ray diffraction measurements. Epitaxial strain is evident in films and is relieved as the thickness increases up to a value of approximately 20 unit cells. Diffraction features associated with a supercell in the films are resolved which is coupled with Te-deficiency. First principles calculations attribute the formation of this defect lattice to nesting conditions in the Fermi surface, which produce a periodic occupancy of the conducting Te square-net, and opens a band gap at the chemical potential. This work establishes the groundwork for exploring the role of strain in tuning electronic and structural phases of epitaxial square-net tellurides and related compounds.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Tuning the Curie temperature of a 2D magnet/topological insulator heterostructure to above room temperature by epitaxial growth
Authors:
Wenyi Zhou,
Alexander J. Bishop,
Xiyue S. Zhang,
Katherine Robinson,
Igor Lyalin,
Ziling Li,
Ryan Bailey-Crandell,
Thow Min Jerald Cham,
Shuyu Cheng,
Yunqiu Kelly Luo,
Daniel C. Ralph,
David A. Muller,
Roland K. Kawakami
Abstract:
Heterostructures of two-dimensional (2D) van der Waals (vdW) magnets and topological insulators (TI) are of substantial interest as candidate materials for efficient spin-torque switching, quantum anomalous Hall effect, and chiral spin textures. However, since many of the vdW magnets have Curie temperatures below room temperature, we want to understand how materials can be modified to stabilize th…
▽ More
Heterostructures of two-dimensional (2D) van der Waals (vdW) magnets and topological insulators (TI) are of substantial interest as candidate materials for efficient spin-torque switching, quantum anomalous Hall effect, and chiral spin textures. However, since many of the vdW magnets have Curie temperatures below room temperature, we want to understand how materials can be modified to stabilize their magnetic ordering to higher temperatures. In this work, we utilize molecular beam epitaxy to systematically tune the Curie temperature ($T_C$) in thin film Fe$_3$GeTe$_2$/Bi$_2$Te$_3$ from bulk-like values ($\sim$220 K) to above room temperature by increasing the growth temperature from 300 $^\circ$C to 375 $^\circ$C. For samples grown at 375 $^\circ$C, cross-sectional scanning transmission electron microscopy (STEM) reveals the spontaneous formation of different Fe$_m$Ge$_n$Te$_2$ compositions (e.g. Fe$_5$Ge$_2$Te$_2$ and Fe$_7$Ge$_6$Te$_2$) as well as intercalation in the vdW gaps, which are possible origins of the enhanced Curie temperature. This observation paves the way for develo** various Fe$_m$Ge$_n$Te$_2$/TI heterostructures with novel properties.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Detecting Spells in Fantasy Literature with a Transformer Based Artificial Intelligence
Authors:
Marcel Moravek,
Alexander Zender,
Andreas Müller
Abstract:
Transformer architectures and models have made significant progress in language-based tasks. In this area, is BERT one of the most widely used and freely available transformer architecture. In our work, we use BERT for context-based phrase recognition of magic spells in the Harry Potter novel series. Spells are a common part of active magic in fantasy novels. Typically, spells are used in a specif…
▽ More
Transformer architectures and models have made significant progress in language-based tasks. In this area, is BERT one of the most widely used and freely available transformer architecture. In our work, we use BERT for context-based phrase recognition of magic spells in the Harry Potter novel series. Spells are a common part of active magic in fantasy novels. Typically, spells are used in a specific context to achieve a supernatural effect. A series of investigations were conducted to see if a Transformer architecture could recognize such phrases based on their context in the Harry Potter saga. For our studies a pre-trained BERT model was used and fine-tuned utilising different datasets and training methods to identify the searched context. By considering different approaches for sequence classification as well as token classification, it is shown that the context of spells can be recognised. According to our investigations, the examined sequence length for fine-tuning and validation of the model plays a significant role in context recognition. Based on this, we have investigated whether spells have overarching properties that allow a transfer of the neural network models to other fantasy universes as well. The application of our model showed promising results and is worth to be deepened in subsequent studies.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings
Authors:
Veronika Hackl,
Alexandra Elena Müller,
Michael Granitzer,
Maximilian Sailer
Abstract:
This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to le…
▽ More
This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to learn more about the interrater reliability, consistency of the ratings across iterations and the correlation between ratings in terms of content and style. The results revealed a high interrater reliability with ICC scores ranging between 0.94 and 0.99 for different timespans, suggesting that GPT-4 is capable of generating consistent ratings across repetitions with a clear prompt. Style and content ratings show a high correlation of 0.87. When applying a non-adequate style the average content ratings remained constant, while style ratings decreased, which indicates that the large language model (LLM) effectively distinguishes between these two criteria during evaluation. The prompt used in this study is furthermore presented and explained. Further research is necessary to assess the robustness and reliability of AI models in various use cases.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Model predictive control for the prescription of antithyroid agents
Authors:
Maylin Menzel,
Tobias M. Wolff,
Johannes W. Dietrich,
Matthias A. Müller
Abstract:
Although hyperthyroidism is a common disease, the pharmaceutical therapy is based on a trial-and-error approach. We extend a mathematical model of the pituitary-thyroid feedback loop such that the intake of one antithyroid agent, namely methimazole (MMI), can be considered and use a model predictive control (MPC) scheme to determine suitable dosages.
Although hyperthyroidism is a common disease, the pharmaceutical therapy is based on a trial-and-error approach. We extend a mathematical model of the pituitary-thyroid feedback loop such that the intake of one antithyroid agent, namely methimazole (MMI), can be considered and use a model predictive control (MPC) scheme to determine suitable dosages.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Towards Continuous Time Finite Horizon LQR Control in SE(3)
Authors:
Shivesh Kumar,
Andreas Mueller,
Patrick Wensing,
Frank Kirchner
Abstract:
The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained…
▽ More
The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained optimization problem and hence are unable to respect the manifold constraints imposed by the group structure of SE(3). This may lead to small errors, singularity problems and double cover issues depending on the choice of coordinates to model the floating base motion. In this paper, we propose the use of canonical exponential coordinates of SE(3) and the associated Exponential map along with its differentials to embed this structure in the theory of finite horizon LQR controllers.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Data-based system representations from irregularly measured data
Authors:
Mohammad Alsalti,
Ivan Markovsky,
Victor G. Lopez,
Matthias A. Müller
Abstract:
Non-parametric representations of dynamical systems based on the image of a Hankel matrix of data are extensively used for data-driven control. However, if samples of data are missing, obtaining such representations becomes a difficult task. By exploiting the kernel structure of Hankel matrices of irregularly measured data generated by a linear time-invariant system, we provide computational metho…
▽ More
Non-parametric representations of dynamical systems based on the image of a Hankel matrix of data are extensively used for data-driven control. However, if samples of data are missing, obtaining such representations becomes a difficult task. By exploiting the kernel structure of Hankel matrices of irregularly measured data generated by a linear time-invariant system, we provide computational methods for which any complete finite-length behavior of the system can be obtained. For the special case of periodically missing outputs, we provide conditions on the input such that the former result is guaranteed. In the presence of noise in the data, our method returns an approximate finite-length behavior of the system. We illustrate our result with several examples, including its use for approximate data completion in real-world applications and compare it to alternative methods.
△ Less
Submitted 8 July, 2024; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning
Authors:
Arthur Müller,
Matthia Sabatelli
Abstract:
Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensiv…
▽ More
Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensive analysis of potential simulation parameters that contribute to this reality gap. We then also examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML). Both strategies were trained with a traffic simulation model of an intersection. In addition, the model was embedded in LemgoRL, a framework that integrates realistic, safety-critical requirements into the control system. Subsequently, we evaluated the performance of the two methods on a separate model of the same intersection that was developed with a different traffic simulator. In this way, we mimic the reality gap. Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm, therefore highlighting their potential to mitigate the reality gap in RLbased TSC systems.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Search for UHE Photons from Gravitational Wave Sources with the Pierre Auger Observatory
Authors:
The Pierre Auger Collaboration,
A. Abdul Halim,
P. Abreu,
M. Aglietta,
I. Allekotte,
K. Almeida Cheminant,
A. Almela,
J. Alvarez-Muñiz,
J. Ammerman Yebra,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
S. Andringa,
C. Aramo,
P. R. Araújo Ferreira,
E. Arnone,
J. C. Arteaga Velázquez,
H. Asorey,
P. Assis,
G. Avila,
E. Avocone,
A. M. Badescu,
A. Bakalova,
A. Balaceanu,
F. Barbato
, et al. (346 additional authors not shown)
Abstract:
A search for time-directional coincidences of ultra-high-energy (UHE) photons above 10 EeV with gravitational wave (GW) events from the LIGO/Virgo runs O1 to O3 is conducted with the Pierre Auger Observatory. Due to the distinctive properties of photon interactions and to the background expected from hadronic showers, a subset of the most interesting GW events is selected based on their localizati…
▽ More
A search for time-directional coincidences of ultra-high-energy (UHE) photons above 10 EeV with gravitational wave (GW) events from the LIGO/Virgo runs O1 to O3 is conducted with the Pierre Auger Observatory. Due to the distinctive properties of photon interactions and to the background expected from hadronic showers, a subset of the most interesting GW events is selected based on their localization quality and distance. Time periods of 1000 s around and 1 day after the GW events are analyzed. No coincidences are observed. Upper limits on the UHE photon fluence from a GW event are derived that are typically at $\sim$7 MeV cm$^{-2}$ (time period 1000~s) and $\sim$35 MeV cm$^{-2}$ (time period 1 day). Due to the proximity of the binary neutron star merger GW170817, the energy of the source transferred into UHE photons above 40 EeV is constrained to be less than 20% of its total gravitational wave energy. These are the first limits on UHE photons from GW sources.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Fairness in KI-Systemen
Authors:
Janine Strotherm,
Alissa Müller,
Barbara Hammer,
Benjamin Paaßen
Abstract:
The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisci…
▽ More
The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples.
--
Je mehr KI-gestützte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. In diesem Kapitel geben wir eine Einführung in die Forschung zu Fairness im maschinellen Lernen. Wir erklären die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnen die Fairness-Forschung in den europäischen Kontext ein. Unser Beitrag richtet sich dabei an ein interdisziplinäres Publikum und verzichtet daher auf die mathematische Formulierung sondern betont Visualisierungen und Beispiele.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Conservative binary dynamics from gravitational tail emission processes
Authors:
Gabriel Luz Almeida,
Alan Müller,
Stefano Foffa,
Riccardo Sturani
Abstract:
We re-analyze the far zone contribution to the two-body conservative dynamics arising from interaction between radiative and longitudinal modes, the latter sourced by mass and angular momentum, which in the mass case is known as tail process. We verify the expected correspondence between two loop self-energy amplitudes and the gluing of two classical (one leading order, one at one loop) emission a…
▽ More
We re-analyze the far zone contribution to the two-body conservative dynamics arising from interaction between radiative and longitudinal modes, the latter sourced by mass and angular momentum, which in the mass case is known as tail process. We verify the expected correspondence between two loop self-energy amplitudes and the gluing of two classical (one leading order, one at one loop) emission amplitudes. In particular we show that the factorization of the self-energy amplitude involving the angular momentum is violated when applying standard computation procedures, due to a violation of the Lorentz gauge condition commonly adopted in perturbative computations. We show however that a straightforward fix exists, as the violation corresponds to a consistent anomaly, and it can be re-absorbed by the variation of a suitable action functional.
△ Less
Submitted 5 November, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Analysis and design of model predictive control frameworks for dynamic operation -- An overview
Authors:
Johannes Köhler,
Matthas A. Müller,
Frank Allgöwer
Abstract:
This article provides an overview of model predictive control (MPC) frameworks for dynamic operation of nonlinear constrained systems. Dynamic operation is often an integral part of the control objective, ranging from tracking of reference signals to the general economic operation of a plant under online changing time-varying operating conditions. We focus on the particular challenges that arise w…
▽ More
This article provides an overview of model predictive control (MPC) frameworks for dynamic operation of nonlinear constrained systems. Dynamic operation is often an integral part of the control objective, ranging from tracking of reference signals to the general economic operation of a plant under online changing time-varying operating conditions. We focus on the particular challenges that arise when dealing with such more general control goals and present methods that have emerged in the literature to address these issues. The goal of this article is to present an overview of the state-of-the-art techniques, providing a diverse toolkit to apply and further develop MPC formulations that can handle the challenges intrinsic to dynamic operation. We also critically assess the applicability of the different research directions, discussing limitations and opportunities for further research.
△ Less
Submitted 9 January, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
3D oxygen vacancy order and defect-property relations in multiferroic (LuFeO$_3$)$_9$/(LuFe$_2$O$_4$)$_1$ superlattices
Authors:
K. A. Hunnestad,
H. Das,
C. Hatzoglou,
M. Holtz,
C. M. Brooks,
A. T. J. van Helvoort,
D. A. Muller,
D. G. Schlom,
J. A. Mundy,
D. Meier
Abstract:
Oxide heterostructures exhibit a vast variety of unique physical properties. Examples are unconventional superconductivity in layered nickelates and topological polar order in (PbTiO$_3$)$_n$/(SrTiO$_3$)$_n$ superlattices. Although it is clear that variations in oxygen content are crucial for the electronic correlation phenomena in oxides, it remains a major challenge to quantify their impact. Her…
▽ More
Oxide heterostructures exhibit a vast variety of unique physical properties. Examples are unconventional superconductivity in layered nickelates and topological polar order in (PbTiO$_3$)$_n$/(SrTiO$_3$)$_n$ superlattices. Although it is clear that variations in oxygen content are crucial for the electronic correlation phenomena in oxides, it remains a major challenge to quantify their impact. Here, we measure the chemical composition in multiferroic (LuFeO$_3$)$_9$/(LuFe$_2$O$_4$)$_1$ superlattices, revealing a one-to-one correlation between the distribution of oxygen vacancies and the electric and magnetic properties. Using atom probe tomography, we observe oxygen vacancies arranging in a layered three-dimensional structure with a local density on the order of 10$^{14}$ cm$^{-2}$, congruent with the formula-unit-thick ferrimagnetic LuFe$_2$O$_4$ layers. The vacancy order is promoted by the locally reduced formation energy and plays a key role in stabilizing the ferroelectric domains and ferrimagnetism in the LuFeO$_3$ and LuFe$_2$O$_4$ layers, respectively. The results demonstrate the importance of oxygen vacancies for the room-temperature multiferroicity in this system and establish an approach for quantifying the oxygen defects with atomic-scale precision in 3D, giving new opportunities for deterministic defect-enabled property control in oxide heterostructures.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Meta-training with Demonstration Retrieval for Efficient Few-shot Learning
Authors:
Aaron Mueller,
Kanika Narang,
Lambert Mathias,
Qifan Wang,
Hamed Firooz
Abstract:
Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of…
▽ More
Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of tasks. To overcome this issue, we propose meta-training with demonstration retrieval, where we use a dense passage retriever to retrieve semantically similar labeled demonstrations to each example for more varied supervision. By separating external knowledge from model parameters, we can use meta-training to train parameter-efficient models that generalize well on a larger variety of tasks. We construct a meta-training set from UnifiedQA and CrossFit, and propose a demonstration bank based on UnifiedQA tasks. To our knowledge, our work is the first to combine retrieval with meta-training, to use DPR models to retrieve demonstrations, and to leverage demonstrations from many tasks simultaneously, rather than randomly sampling demonstrations from the training set of the target task. Our approach outperforms a variety of targeted parameter-efficient and retrieval-augmented few-shot methods on QA, NLI, and text classification tasks (including SQuAD, QNLI, and TREC). Our approach can be meta-trained and fine-tuned quickly on a single GPU.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Screw and Lie Group Theory in Multibody Dynamics -- Recursive Algorithms and Equations of Motion of Tree-Topology Systems
Authors:
Andreas Mueller
Abstract:
Screw and Lie group theory allows for user-friendly modeling of multibody systems (MBS) while at the same they give rise to computationally efficient recursive algorithms. The inherent frame invariance of such formulations allows for use of arbitrary reference frames within the kinematics modeling (rather than obeying modeling conventions such as the Denavit-Hartenberg convention) and to avoid int…
▽ More
Screw and Lie group theory allows for user-friendly modeling of multibody systems (MBS) while at the same they give rise to computationally efficient recursive algorithms. The inherent frame invariance of such formulations allows for use of arbitrary reference frames within the kinematics modeling (rather than obeying modeling conventions such as the Denavit-Hartenberg convention) and to avoid introduction of joint frames. The computational efficiency is owed to a representation of twists, accelerations, and wrenches that minimizes the computational effort. This can be directly carried over to dynamics formulations. In this paper recursive $O\left( n\right) $ Newton-Euler algorithms are derived for the four most frequently used representations of twists, and their specific features are discussed. These formulations are related to the corresponding algorithms that were presented in the literature. The MBS motion equations are derived in closed form using the Lie group formulation. One are the so-called 'Euler-Jourdain' or 'projection' equations, of which Kane's equations are a special case, and the other are the Lagrange equations. The recursive kinematics formulations are readily extended to higher orders in order to compute derivatives of the motions equations. To this end, recursive formulations for the acceleration and jerk are derived. It is briefly discussed how this can be employed for derivation of the linearized motion equations and their time derivatives. The geometric modeling allows for direct application of Lie group integration methods, which is briefly discussed.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Screw and Lie Group Theory in Multibody Kinematics -- Motion Representation and Recursive Kinematics of Tree-Topology Systems
Authors:
Andreas Mueller
Abstract:
After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joi…
▽ More
After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joints are screw motions. Moreover, screw theory provides the geometric setting and Lie group theory the analytic foundation for an intuitive and compact MBS modeling. The inherent frame invariance of this modeling approach gives rise to very efficient recursive $O\left( n\right) $ algorithms, for which the so-called 'spatial operator algebra' is one example, and allows for use of readily available geometric data. In this paper three variants for describing the configuration of tree-topology MBS in terms of relative coordinates, i.e. joint variables, are presented: the standard formulation using body-fixed joint frames, a formulation without joint frames, and a formulation without either joint or body-fixed reference frames. This allows for describing the MBS kinematics without introducing joint reference frames and therewith rendering the use of restrictive modeling convention, such as Denavit-Hartenberg parameters, redundant. Four different definitions of twists are recalled and the corresponding recursive expressions are derived. The corresponding Jacobians and their factorization are derived. The aim of this paper is to motivate the use of Lie group modeling and to provide a review of the different formulations for the kinematics of tree-topology MBS in terms of relative (joint) coordinates from the unifying perspective of screw and Lie group theory.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo
Authors:
JUNO Collaboration,
Angel Abusleme,
Thomas Adam,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Muhammad Akram,
Abid Aleem,
Tsagkarakis Alexandros,
Fengpeng An,
Qi An,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Wander Baldini,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Bellato
, et al. (581 additional authors not shown)
Abstract:
We discuss JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo via detecting inverse beta decay reactions of electron anti-neutrinos resulting from the annihilation. We study possible backgrounds to the signature, including the reactor neutrinos, diffuse supernova neutrino background, charged- and neutral-current interactions of atmospheric neutrinos, backgrounds from muon…
▽ More
We discuss JUNO sensitivity to the annihilation of MeV dark matter in the galactic halo via detecting inverse beta decay reactions of electron anti-neutrinos resulting from the annihilation. We study possible backgrounds to the signature, including the reactor neutrinos, diffuse supernova neutrino background, charged- and neutral-current interactions of atmospheric neutrinos, backgrounds from muon-induced fast neutrons and cosmogenic isotopes. A fiducial volume cut, as well as the pulse shape discrimination and the muon veto are applied to suppress the above backgrounds. It is shown that JUNO sensitivity to the thermally averaged dark matter annihilation rate in 10 years of exposure would be significantly better than the present-day best limit set by Super-Kamiokande and would be comparable to that expected by Hyper-Kamiokande.
△ Less
Submitted 13 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Inverse Scaling: When Bigger Isn't Better
Authors:
Ian R. McKenzie,
Alexander Lyzhov,
Michael Pieler,
Alicia Parrish,
Aaron Mueller,
Ameya Prabhu,
Euan McLean,
Aaron Kirtland,
Alexis Ross,
Alisa Liu,
Andrew Gritsevskiy,
Daniel Wurgaft,
Derik Kauffman,
Gabriel Recchia,
Jiacheng Liu,
Joe Cavanagh,
Max Weiss,
Sicong Huang,
The Floating Droid,
Tom Tseng,
Tomasz Korbak,
Xudong Shen,
Yuhui Zhang,
Zheng** Zhou,
Najoung Kim
, et al. (2 additional authors not shown)
Abstract:
Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling…
▽ More
Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.
△ Less
Submitted 12 May, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes
Authors:
Adrian Müller,
Pragnya Alatur,
Giorgia Ramponi,
Niao He
Abstract:
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one…
▽ More
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another. However, we do not consider such a behavior safe in practical applications. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during $K$ episodes of exploring the CMDP, our algorithm obtains a regret of $\tilde{O}(\sqrt{K})$ for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.
△ Less
Submitted 30 August, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Exchange bias between van der Waals materials: tilted magnetic states and field-free spin-orbit-torque switching
Authors:
Thow Min Jerald Cham,
Reiley J. Dorrian,
Xiyue S. Zhang,
Avalon H. Dismukes,
Daniel G. Chica,
Andrew F. May,
Xavier Roy,
David A. Muller,
Daniel C. Ralph,
Yunqiu Kelly Luo
Abstract:
Magnetic van der Waals heterostructures provide a unique platform to study magnetism and spintronics device concepts in the two-dimensional limit. Here, we report studies of exchange bias from the van der Waals antiferromagnet CrSBr acting on the van der Waals ferromagnet Fe3GeTe2 (FGT). The orientation of the exchange bias is along the in-plane easy axis of CrSBr, perpendicular to the out-of-plan…
▽ More
Magnetic van der Waals heterostructures provide a unique platform to study magnetism and spintronics device concepts in the two-dimensional limit. Here, we report studies of exchange bias from the van der Waals antiferromagnet CrSBr acting on the van der Waals ferromagnet Fe3GeTe2 (FGT). The orientation of the exchange bias is along the in-plane easy axis of CrSBr, perpendicular to the out-of-plane anisotropy of the FGT, inducing a strongly tilted magnetic configuration in the FGT. Furthermore, the in-plane exchange bias provides sufficient symmetry breaking to allow deterministic spin-orbit torque switching of the FGT in CrSBr/FGT/Pt samples at zero applied magnetic field. A minimum thickness of the CrSBr greater than 10 nm is needed to provide a non-zero exchange bias at 30 K.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
Authors:
Aaron Mueller,
Tal Linzen
Abstract:
Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear features - when performing tasks after fine-tuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive…
▽ More
Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear features - when performing tasks after fine-tuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive bias that favors hierarchical syntactic generalizations. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus, diagnosing inductive biases using two syntactic transformation tasks: question formation and passivization, both in English. We find that the number of parameters alone does not explain hierarchical generalization: model depth plays greater role than model width. We also find that pre-training on simpler language, such as child-directed speech, induces a hierarchical bias using an order-of-magnitude less data than pre-training on more typical datasets based on web text or Wikipedia; this suggests that in cognitively plausible language acquisition settings, neural language models may be more data-efficient than previously thought.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Strain Relaxation in Core-Shell Pt-Co Catalyst Nanoparticles
Authors:
Elliot Padgett,
Megan E. Holtz,
Anusorn Kongkanand,
David A. Muller
Abstract:
Surface strain plays a key role in enhancing the activity of Pt-alloy nanoparticle oxygen reduction catalysts. However, the details of strain effects in real fuel cell catalysts are not well-understood, in part due to a lack of strain characterization techniques that are suitable for complex supported nanoparticle catalysts. This work investigates these effects using strain map** with nanobeam e…
▽ More
Surface strain plays a key role in enhancing the activity of Pt-alloy nanoparticle oxygen reduction catalysts. However, the details of strain effects in real fuel cell catalysts are not well-understood, in part due to a lack of strain characterization techniques that are suitable for complex supported nanoparticle catalysts. This work investigates these effects using strain map** with nanobeam electron diffraction and a continuum elastic model of strain in simple core-shell particles. We find that surface strain is relaxed both by lattice defects at the core-shell interface and by relaxation across particle shells caused by Poisson expansion in the spherical geometry. The continuum elastic model finds that in the absence of lattice dislocations, geometric relaxation results in a surface strain that scales with the average composition of the particle, regardless of the shell thickness. We investigate the impact of these strain effects on catalytic activity for a series of Pt-Co catalysts treated to vary their shell thickness and core-shell lattice mismatch. For catalysts with the thinnest shells, the activity is consistent with an Arrhenius dependence on the surface strain expected for coherent strain in dislocation-free particles, while catalysts with thicker shells showed greater activity losses indicating strain relaxation caused by dislocations as well.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.