-
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Authors:
Guilherme Penedo,
Hynek Kydlíček,
Loubna Ben allal,
Anton Lozhkov,
Margaret Mitchell,
Colin Raffel,
Leandro Von Werra,
Thomas Wolf
Abstract:
The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produ…
▽ More
The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including in-depth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb. LLMs pretrained on FineWeb-Edu exhibit dramatically better performance on knowledge- and reasoning-intensive benchmarks like MMLU and ARC. Along with our datasets, we publicly release our data curation codebase and all of the models trained during our ablation experiments.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Bounding irrelevant operators in the 3d Gross-Neveu-Yukawa CFTs
Authors:
Matthew S. Mitchell,
David Poland
Abstract:
We perform a numerical bootstrap study of scalar operators in the critical 3d Gross-Neveu-Yukawa models, a family of conformal field theories containing N Majorana fermions in the fundamental representation of an O(N) global symmetry. We compute rigorous bounds on the scaling dimensions of the next-to-lowest parity-even and parity-odd singlet scalars at N = 2, 4, and 8. All of these dimensions hav…
▽ More
We perform a numerical bootstrap study of scalar operators in the critical 3d Gross-Neveu-Yukawa models, a family of conformal field theories containing N Majorana fermions in the fundamental representation of an O(N) global symmetry. We compute rigorous bounds on the scaling dimensions of the next-to-lowest parity-even and parity-odd singlet scalars at N = 2, 4, and 8. All of these dimensions have lower bounds greater than 3, implying that there are only two relevant singlet scalars and placing constraints on the RG flow structure of these theories.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Reviewing climate change attribution in UK natural hazards and their impacts
Authors:
Regan Mudhar,
Dann M. Mitchell,
Peter A. Stott,
Richard A. Betts
Abstract:
The field of Detection and Attribution is rapidly moving beyond weather and climate, and towards incorporating hazards and their impacts on natural and human systems. Here, we review the comprehensive literature base relevant for the UK ahead of the next Climate Change Risk Assessment. The current literature highlights a detectable and non-trivial influence of climate change in many UK impact sect…
▽ More
The field of Detection and Attribution is rapidly moving beyond weather and climate, and towards incorporating hazards and their impacts on natural and human systems. Here, we review the comprehensive literature base relevant for the UK ahead of the next Climate Change Risk Assessment. The current literature highlights a detectable and non-trivial influence of climate change in many UK impact sectors already - notably health, agriculture, and infrastructure. We found that heatwaves were the most studied hazard overall, with a unanimous consensus on a strong attributable signal of human-induced climate change in their increased frequency and intensity over the last century. The most notable gap identified overall was in attributing climate-related impacts to human influence, with a few impact studies for only a handful of the hazards assessed. Furthermore, just under half of the 29 hazards were not found to have any UK-relevant attribution studies, with most of the remainder having three or fewer. This review highlights requirements for and opportunities to develop attribution scicnce to meet the needs of the UK. Diversifying hazards and impacts studied, in conjunction with the techniques and approaches used, will undoubtedly benefit the community.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
Authors:
Giada Pistilli,
Alina Leidinger,
Yacine Jernite,
Atoosa Kasirzadeh,
Alexandra Sasha Luccioni,
Margaret Mitchell
Abstract:
This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, so…
▽ More
This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy. CIVICS is designed to generate responses showing LLMs' encoded and implicit values. Through our dynamic annotation processes, tailored prompt design, and experiments, we investigate how open-weight LLMs respond to value-sensitive issues, exploring their behavior across diverse linguistic and cultural contexts. Using two experimental set-ups based on log-probabilities and long-form responses, we show social and cultural variability across different LLMs. Specifically, experiments involving long-form responses demonstrate that refusals are triggered disparately across models, but consistently and more frequently in English or translated statements. Moreover, specific topics and sources lead to more pronounced differences across model answers, particularly on immigration, LGBTQI rights, and social welfare. As shown by our experiments, the CIVICS dataset aims to serve as a tool for future research, promoting reproducibility and transparency across broader linguistic settings, and furthering the development of AI technologies that respect and reflect global cultural diversities and value pluralism. The CIVICS dataset and tools will be made available upon publication under open licenses; an anonymized version is currently available at https://huggingface.co/CIVICS-dataset.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Functionalized mm-scale vapor cells for alkali-metal spectroscopy and magnetometry
Authors:
Harini Raghavan,
Michael C. D. Tayler,
Kostas Mouloudakis,
Rachel Rae,
Sami Lähteenmäki,
Rasmus Zetter,
Petteri Laine,
Jacques Haesler,
Laurent Balet,
Thomas Overstolz,
Sylvain Karlen,
Morgan W. Mitchell
Abstract:
We describe micro-fabricated rubidium vapor cells with integrated temperature-control functionality and demonstrate their suitability for use in miniaturized ultra-sensitive magnetometers. These functionalized vapor cells (FVCs) embody a dual-chamber design in low-conductivity silicon with anti-permeation coatings and micro-structured thin-film platinum surface traces as resistive heaters and temp…
▽ More
We describe micro-fabricated rubidium vapor cells with integrated temperature-control functionality and demonstrate their suitability for use in miniaturized ultra-sensitive magnetometers. These functionalized vapor cells (FVCs) embody a dual-chamber design in low-conductivity silicon with anti-permeation coatings and micro-structured thin-film platinum surface traces as resistive heaters and temperature sensors. Thermal tests show our ability to control alkali metal distribution within the FVCs, ensuring a clean sensing chamber for optical measurements. Optical absorption spectroscopy is used to correlate the temperature readings with vapor density and to measure buffer gas pressure, of interest for optimizing sensitivity. Finally, we demonstrate zero-field resonance magnetometry with 18 fT/Hz$^{1/2}$ sensitivity in the 10 Hz to 100 Hz band, limited by laser noise and magnetic shield noise, which indicates that the functionalization does not introduce significant magnetic noise.
△ Less
Submitted 21 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
GHz-rate optical phase shift in light matter interaction-engineered, silicon-ferroelectric nematic liquid crystals
Authors:
Iman Taghavi,
Omid Esmaeeli,
Sheri Jahan Chowdhury,
Matthew Mitchell,
Donald Witt,
Cory Pecinovsky,
Jason Sickler,
Nicolas A. F. Jaeger,
Sudip Shekhar,
Lukas Chrostowski
Abstract:
Organic electro-optic (OEO) materials have demonstrated promising performance in develo** electro-optic phase shifters (EOPS) and modulators compared to their inorganic counterparts. Integration with other devices in a silicon photonic (SiP) process, simple nanofabrication, and temperature/aging robustness remain to be developed for this class of hybrid material platforms. In particular, electro…
▽ More
Organic electro-optic (OEO) materials have demonstrated promising performance in develo** electro-optic phase shifters (EOPS) and modulators compared to their inorganic counterparts. Integration with other devices in a silicon photonic (SiP) process, simple nanofabrication, and temperature/aging robustness remain to be developed for this class of hybrid material platforms. In particular, electro-optic (EO) polymers need an electro-thermal poling method, which has limited their potential and utilization in large-scale SiP. Devices made of paraelectric nematic liquid crystals (PN-LC), another primary type of OEO material, feature a very efficient but slow phase shift mechanism. We present a general-purpose EOPS that applies to various modulator embodiments to address these concerns. Based on that, we report a GHz-fast phase shift in a newly discovered family of OEO, namely ferroelectric nematic liquid crystals (FN-LC), which finally enables liquid crystals to have significant second-order nonlinear optical coefficients and associated Pockels effect. The new material avoids poling issues associated with EO polymers and can pave the way for hybrid silicon-OEO systems with CMOS-foundry compatibility. Furthermore, we propose a finger-loaded, non-slotted waveguide that enhances light-matter interaction, allowing us to achieve DC and AC modulation efficiencies of $\approx$ 0.25 V.mm and 25.7 V.mm, respectively, an on-chip insertion loss of $\approx$ 4 dB, and an EO bandwidth of f$_{-6dB}$ >4.18 GHz. The remaining figures of merit for our poling-free EOPS are equivalent to EO polymer-enabled devices with fewer manufacturing difficulties. We demonstrate an electrically and photonically packaged chip that contains >100 silicon-FN-LC modulators to evaluate the large-scale integration of our poling-free phase shifters and modulators.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Laser-written micro-channel atomic magnetometer
Authors:
Andrea Zanoni,
Kostas Mouloudakis,
Michael C. D. Tayler,
Giacomo Corrielli,
Roberto Osellame,
Morgan W. Mitchell,
Vito Giovanni Lucivero
Abstract:
We demonstrate a sensitive optically-pumped magnetometer using rubidium vapor and 0.75 amg of nitrogen buffer gas in a sub-mm-width sensing channel excavated by femtosecond laser writing followed by chemical etching. The channel is buried less than 1 mm below the surface of its fused silica host material, which also includes reservoir chambers and micro-strainer connections, to preserve a clean op…
▽ More
We demonstrate a sensitive optically-pumped magnetometer using rubidium vapor and 0.75 amg of nitrogen buffer gas in a sub-mm-width sensing channel excavated by femtosecond laser writing followed by chemical etching. The channel is buried less than 1 mm below the surface of its fused silica host material, which also includes reservoir chambers and micro-strainer connections, to preserve a clean optical environment. Using a zero-field-resonance magnetometry strategy and a sensing volume of 2.25 mm$^3$, we demonstrate a sensitivity of $\approx$ 1 $\mathrm{pT}/\sqrt{\mathrm{Hz}}$ at $10$ Hz. The device can be integrated with photonic structures and microfluidic channels with 3D versatility. Its sensitivity, bandwidth and stand-off distance will enable detection of localized fields from magnetic nanoparticles and \mul NMR samples.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Multi-View Deep Learning for Imaging Atmospheric Cherenkov Telescopes
Authors:
Hannes Warnhofer,
Samuel T. Spencer,
Alison M. W. Mitchell
Abstract:
This research note concerns the application of deep-learning-based multi-view-imaging techniques to data from the H.E.S.S. Imaging Atmospheric Cherenkov Telescope array. We find that the earlier the fusion of layer information from different views takes place in the neural network, the better our model performs with this data. Our analysis shows that the point in the network where the information…
▽ More
This research note concerns the application of deep-learning-based multi-view-imaging techniques to data from the H.E.S.S. Imaging Atmospheric Cherenkov Telescope array. We find that the earlier the fusion of layer information from different views takes place in the neural network, the better our model performs with this data. Our analysis shows that the point in the network where the information from the different views is combined is far more important for the model performance than the method used to combine the information.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Probing Stellar Clusters from Gaia DR2 as Galactic PeVatrons: I -- Expected Gamma-ray and Neutrino Emission
Authors:
Alison M. W. Mitchell,
Giovanni Morlino,
Silvia Celli,
Stefano Menchiari,
Andreas Specovius
Abstract:
Young & massive stellar clusters (SCs) are a potential source of galactic cosmic rays up to very high energies as a result of two possible acceleration scenarios. Collective stellar winds from massive member stars form a wind-blown bubble with a termination shock (TS) at which particle acceleration to PeV energies may occur. Furthermore, shock acceleration may occur at SNRs expanding inside the bu…
▽ More
Young & massive stellar clusters (SCs) are a potential source of galactic cosmic rays up to very high energies as a result of two possible acceleration scenarios. Collective stellar winds from massive member stars form a wind-blown bubble with a termination shock (TS) at which particle acceleration to PeV energies may occur. Furthermore, shock acceleration may occur at SNRs expanding inside the bubble. By applying a model of CR acceleration at both the wind TS and SNR shocks to catalogues of known SCs derived from Gaia DR2, we identify the most promising targets to search for evidence of PeVatron activity. Predictions for the secondary fluxes of gamma-ray and neutrino emission are derived based on particle acceleration at the collective wind TS and the subsequent hadronic interactions with the surrounding medium. Predictions from our modelling under baseline and optimistic scenarios are compared to data, finding consistent results. We estimate the detection prospects for future gamma-ray and neutrino experiments. We find that degree-scale angular sizes of the wind-blown bubbles are typical, that may pose a challenge for experimental detection. A shortlist of the most promising candidates is provided, with an anticipated flux range. Of order 10 SCs may be detectable with future facilities, and 1-5 could be currently operating as PeVatrons. Of these, three gamma-ray detected SCs have data within our predicted range. Our model can consistently describe gamma-ray measurements of SC emission. Several further as-yet-undetected SCs offer promising targets for future observations, although the flux range allowed by our model can be large (> factor 10). The large angular size of the wind-blown bubble may lead to low surface brightness emission, worsening the problem of source confusion. Nevertheless, we discuss how further work will help to constrain SCs as PeVatron candidates. (abridged)
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Quantum jump photodetector for narrowband photon counting with a single atom
Authors:
Laura Zarraoa,
Romain Veyron,
Tomas Lamich,
Morgan W. Mitchell
Abstract:
Using a single neutral \textsuperscript{87}Rb atom held in an optical trap, and "quantum jump" detection of single-photon-initiated state changes, we demonstrate an intrinsically-narrowband single-photon detector, of interest for separating weak signals from strong optical background. Using novel statistical analysis, we measure quantum efficiency of \SI{2.9+-0.2e-3}{}, a record for single-pass qu…
▽ More
Using a single neutral \textsuperscript{87}Rb atom held in an optical trap, and "quantum jump" detection of single-photon-initiated state changes, we demonstrate an intrinsically-narrowband single-photon detector, of interest for separating weak signals from strong optical background. Using novel statistical analysis, we measure quantum efficiency of \SI{2.9+-0.2e-3}{}, a record for single-pass quantum jump production, and dark counts of \SI{9+-20e-3}{counts\per\second} during passive accumulation plus \SI{1.8+-0.1e-2}{counts} per readout, orders of magnitude below those of traditional single-photon detectors. The \SI{6}{\mega\hertz} detection bandwidth is orders of magnitude narrower than existing atomic filters. Available methods can substantially improve \QJPDAcronym{} quantum efficiency, dark counts, bandwidth, and tunability.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Performance of a modular ton-scale pixel-readout liquid argon time projection chamber
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1340 additional authors not shown)
Abstract:
The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi…
▽ More
The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Live magnetic observation of parahydrogen hyperpolarization dynamics
Authors:
James Eills,
Morgan W. Mitchell,
Irene Marco Rius,
Michael C. D. Tayler
Abstract:
Hyperpolarized nuclear spins in molecules exhibit high magnetization that is unachievable by classical polarization techniques, making them widely used as sensors in physics, chemistry, and medicine. The state of a hyperpolarized material, however, is typically only studied indirectly and with partial destruction of magnetization, due to the nature of conventional detection by resonant-pickup nucl…
▽ More
Hyperpolarized nuclear spins in molecules exhibit high magnetization that is unachievable by classical polarization techniques, making them widely used as sensors in physics, chemistry, and medicine. The state of a hyperpolarized material, however, is typically only studied indirectly and with partial destruction of magnetization, due to the nature of conventional detection by resonant-pickup nuclear magnetic resonance spectroscopy or imaging. Here we establish atomic magnetometers with sub-pT sensitivity as an use an alternative modality to detect in real time the complex dynamics of hyperpolarized materials without disturbing or interrupting the magnetogenesis process. As an example of dynamics that are impossible to detect in real time by conventional means, we examine parahydrogen-induced $^{1}$H and $^{13}$C magnetization during adiabatic eigenbasis transformations at \si{\micro\tesla}-field avoided crossings. Continuous but nondestructive magnetometry reveals previously unseen spin dynamics, fidelity limits, and magnetization back-action effects. As a second example, we apply magnetometry to observe the chemical-exchange-driven $^{13}$C hyperpolarization of [1--$^{13}$C]-pyruvate -- the most important spin tracer for clinical metabolic imaging. Our approach can be readily combined with other high-sensitivity magnetometers and is applicable to a broader range of general observation scenarios involving production, transport and systems interaction of hyperpolarized compounds.
△ Less
Submitted 22 May, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Spin projection noise and the magnetic sensitivity of optically pumped magnetometers
Authors:
K. Mouloudakis,
V. Koutrouli,
I. K. Kominis,
M. W. Mitchell,
G. Vasilakis
Abstract:
Present protocols for obtaining the ultimate magnetic sensitivity of optically pumped magnetometers (OPMs) utilizing alkali-metal ensembles rely on uncorrelated atoms in stretched states. A new approach for calculating the spin projection noise (SPN)-limited signal to noise ratio (SNR) and the magnetic sensitivity of OPMs is proposed. Our model is based solely on the mean-field density matrix dyna…
▽ More
Present protocols for obtaining the ultimate magnetic sensitivity of optically pumped magnetometers (OPMs) utilizing alkali-metal ensembles rely on uncorrelated atoms in stretched states. A new approach for calculating the spin projection noise (SPN)-limited signal to noise ratio (SNR) and the magnetic sensitivity of OPMs is proposed. Our model is based solely on the mean-field density matrix dynamics and in contrast to previous models, it applies to both low and high field regimes, it takes into account the degree of spin polarization, the intra- and interhyperfine correlations, the decoherence processes, the atom-light coupling and the effects of the spin dynamics on the spin-noise spectra. Fine tuning of the probe frequency allow us to explore different hyperfine states and ground-state correlations. Especially in the spin-exchange-relaxation-free (SERF) regime, alongside the magnetic resonance narrowing and the increased number density, hallmarks of SERF magnetometers, we report on a new SERF feature; the reduction of spin-projection noise at the spin precession frequency as a consequence of strongly-correlated hyperfine spins that attenuate and redistribute SPN when properly probed.
△ Less
Submitted 22 February, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
Authors:
Martha Lewis,
Melanie Mitchell
Abstract:
Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making…
▽ More
Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making abilities previously claimed for LLMs (Webb, Holyoak, & Lu, 2023). We take one set of analogy problems used to evaluate LLMs and create a set of "counterfactual" variants-versions that test the same abstract reasoning abilities but that are likely dissimilar from any pre-training data. We test humans and three GPT models on both the original and counterfactual problems, and show that, while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set. This work provides evidence that, despite previously reported successes of LLMs on analogical reasoning, these models lack the robustness and generality of human analogy-making.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Do** Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
H. Amar Es-sghir,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos
, et al. (1300 additional authors not shown)
Abstract:
Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN…
▽ More
Do** of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first do** test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 770 t of total liquid argon mass with 410 t of fiducial mass. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon do** can substantially recover light losses due to contamination of the liquid argon by nitrogen.
△ Less
Submitted 9 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
PAC Code Rate-Profile Design Using Search-Constrained Optimization Algorithms
Authors:
Mohsen Moradi,
David G. M. Mitchell
Abstract:
In this paper, we introduce a novel rate-profile design based on search-constrained optimization techniques to assess the performance of polarization-adjusted convolutional (PAC) codes under Fano (sequential) decoding. The results demonstrate that the resulting PAC code offers much reduced computational complexity compared to a construction based on a conventional genetic algorithm without a perfo…
▽ More
In this paper, we introduce a novel rate-profile design based on search-constrained optimization techniques to assess the performance of polarization-adjusted convolutional (PAC) codes under Fano (sequential) decoding. The results demonstrate that the resulting PAC code offers much reduced computational complexity compared to a construction based on a conventional genetic algorithm without a performance loss in error-correction performance. As the fitness function of our algorithm, we propose an adaptive successive cancellation list decoding algorithm to determine the weight distribution of the rate profiles. The simulation results indicate that, for a PAC(256, 128) code, only 8% of the population requires that their fitness function be evaluated with a large list size. This represents an improvement of almost 92% over a conventional evolutionary algorithm. For a PAC(64, 32) code, this improvement is about 99%. We also plotted the performance of the high-rate PAC(128, 105) and PAC(64, 51) codes, and the results show that they exhibit superior performance compared to other algorithms.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
MorpheusNet: Resource efficient sleep stage classifier for embedded on-line systems
Authors:
Ali Kavoosi,
Morgan P. Mitchell,
Raveen Kariyawasam,
John E. Fleming,
Penny Lewis,
Heidi Johansen-Berg,
Hayriye Cagnan,
Timothy Denison
Abstract:
Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning…
▽ More
Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95% in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm CortexM4 processor, allowing its use for on-line SSC-based therapies.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Stability of superfluids in tilted optical lattices with periodic driving
Authors:
Robbie Cruickshank,
Andrea Di Carli,
Matthew Mitchell,
Arthur La Rooij,
Stefan Kuhr,
Charles E. Creffield,
Elmar Haller
Abstract:
Tilted lattice potentials with periodic driving play a crucial role in the study of artificial gauge fields and topological phases with ultracold quantum gases. However, driving-induced heating and the growth of phonon modes restrict their use for probing interacting many-body states. Here, we experimentally investigate phonon modes and interaction-driven instabilities of superfluids in the lowest…
▽ More
Tilted lattice potentials with periodic driving play a crucial role in the study of artificial gauge fields and topological phases with ultracold quantum gases. However, driving-induced heating and the growth of phonon modes restrict their use for probing interacting many-body states. Here, we experimentally investigate phonon modes and interaction-driven instabilities of superfluids in the lowest band of a shaken optical lattice. We identify stable and unstable parameter regions and provide a general resonance condition. In contrast to the high-frequency approximation of a Floquet description, we use the superfluids' micromotion to analyze the growth of phonon modes from slow to fast driving frequencies. Our observations enable the prediction of stable parameter regimes for quantum-simulation experiments aimed at studying driven systems with strong interactions over extended time scales.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Cavity-resonated detection of spin polarization in a microfabricated atomic vapor cell
Authors:
María Hernández Ruiz,
Yintao Ma,
Hana Medhat,
Vito Giovanni Lucivero,
Morgan W. Mitchell
Abstract:
We demonstrate continuous Pound-Drever-Hall (PDH) nondestructive monitoring of the electron spin polarization of an atomic vapor in a microfabricated vapor cell within an optical resonator. The two-chamber silicon and glass cell contains $^{87}$Rb and 1.3 amagat of N$_{2}$ buffer gas, and is placed within a planar optical resonator formed by two mirrors with dichroic dielectric coatings to resonan…
▽ More
We demonstrate continuous Pound-Drever-Hall (PDH) nondestructive monitoring of the electron spin polarization of an atomic vapor in a microfabricated vapor cell within an optical resonator. The two-chamber silicon and glass cell contains $^{87}$Rb and 1.3 amagat of N$_{2}$ buffer gas, and is placed within a planar optical resonator formed by two mirrors with dichroic dielectric coatings to resonantly enhance the coupling to phase-modulated probe light near the D$_2$ line at 780 nm. We describe the theory of signal generation in this system, including the spin-dependent complex refractive index, cavity optical transfer functions, and PDH signal response to spin polarization. We observe cavity transmission and PDH signals across $\approx 200$ GHz of detuning around the atomic resonance line. By resonant optical pum** on the 795 nm D$_1$ line, we observe spin-dependent cavity line shifts, in good agreement with theory. We use the saturation of the line shift vs. optical pum** power to calibrate the number density and efficiency of the optical pum**. In the unresolved sideband regime, we observe quantum-noise-limited PDH readout of the spin polarization density, with a flat noise floor of $9 \times 10^9$ spins cm$^{-3}$ Hz$^{-1/2}$ for frequencies above 700 Hz. We note possible extensions of the technique.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Perspectives on the State and Future of Deep Learning - 2023
Authors:
Micah Goldblum,
Anima Anandkumar,
Richard Baraniuk,
Tom Goldstein,
Kyunghyun Cho,
Zachary C Lipton,
Melanie Mitchell,
Preetum Nakkiran,
Max Welling,
Andrew Gordon Wilson
Abstract:
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, kee** an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on inter…
▽ More
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, kee** an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on interpretable AI, the value of benchmarking in modern NLP, the state of progress towards understanding deep learning, and the future of academia.
△ Less
Submitted 18 December, 2023; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Telling different unravelings apart via nonlinear quantum-trajectory averages
Authors:
Eloy Piñol,
Th. K. Mavrogordatos,
Dustin Keys,
Romain Veyron,
Piotr Sierant,
Miguel Angel García-March,
Samuele Grandi,
Morgan W. Mitchell,
Jan Wehr,
Maciej Lewenstein
Abstract:
The Gorini-Kossakowski-Sudarshan-Lindblad master equation (ME) governs the density matrix of open quantum systems (OQSs). When an OQS is subjected to weak continuous measurement, its state evolves as a stochastic quantum trajectory, whose statistical average solves the ME. The ensemble of such trajectories is termed an unraveling of the ME. We propose a method to operationally distinguish unraveli…
▽ More
The Gorini-Kossakowski-Sudarshan-Lindblad master equation (ME) governs the density matrix of open quantum systems (OQSs). When an OQS is subjected to weak continuous measurement, its state evolves as a stochastic quantum trajectory, whose statistical average solves the ME. The ensemble of such trajectories is termed an unraveling of the ME. We propose a method to operationally distinguish unravelings produced by the same ME in different measurement scenarios, using nonlinear averages of observables over trajectories. We apply the method to the paradigmatic quantum nonlinear system of resonance fluorescence in a two-level atom. We compare the Poisson-type unraveling, induced by direct detection of photons scattered from the two-level emitter, and the Wiener-type unraveling, induced by phase-sensitive detection of the emitted field. We show that a quantum-trajectory-averaged variance is able to distinguish these measurement scenarios. We evaluate the performance of the method, which can be readily extended to more complex OQSs, under a range of realistic experimental conditions.
△ Less
Submitted 8 May, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
The DUNE Far Detector Vertical Drift Technology, Technical Design Report
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos
, et al. (1304 additional authors not shown)
Abstract:
DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precisi…
▽ More
DUNE is an international experiment dedicated to addressing some of the questions at the forefront of particle physics and astrophysics, including the mystifying preponderance of matter over antimatter in the early universe. The dual-site experiment will employ an intense neutrino beam focused on a near and a far detector as it aims to determine the neutrino mass hierarchy and to make high-precision measurements of the PMNS matrix parameters, including the CP-violating phase. It will also stand ready to observe supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model.
The DUNE far detector implements liquid argon time-projection chamber (LArTPC) technology, and combines the many tens-of-kiloton fiducial mass necessary for rare event searches with the sub-centimeter spatial resolution required to image those events with high precision. The addition of a photon detection system enhances physics capabilities for all DUNE physics drivers and opens prospects for further physics explorations. Given its size, the far detector will be implemented as a set of modules, with LArTPC designs that differ from one another as newer technologies arise.
In the vertical drift LArTPC design, a horizontal cathode bisects the detector, creating two stacked drift volumes in which ionization charges drift towards anodes at either the top or bottom. The anodes are composed of perforated PCB layers with conductive strips, enabling reconstruction in 3D. Light-trap-style photon detection modules are placed both on the cryostat's side walls and on the central cathode where they are optically powered.
This Technical Design Report describes in detail the technical implementations of each subsystem of this LArTPC that, together with the other far detector modules and the near detector, will enable DUNE to achieve its physics goals.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
NuSTAR Hard X-ray Monitoring of Gravitationally Lensed Quasar RX J1131-1231
Authors:
Cora A. DeFrancesco,
Xinyu Dai,
Mark Mitchell,
Abderahmen Zoghbi,
Christopher W. Morgan
Abstract:
The X-ray emission from active galactic nuclei (AGN) is believed to come from a combination of inverse Compton scattering of photons from the accretion disk and reprocessing of the direct X-ray emission by reflection. We present hard (10-80 keV) and soft (0.5-8 keV) X-ray monitoring of a gravitationally lensed quasar RX J1131-1231 with NuSTAR, Swift, and XMM-Newton between 10 June 2016 and 30 Nove…
▽ More
The X-ray emission from active galactic nuclei (AGN) is believed to come from a combination of inverse Compton scattering of photons from the accretion disk and reprocessing of the direct X-ray emission by reflection. We present hard (10-80 keV) and soft (0.5-8 keV) X-ray monitoring of a gravitationally lensed quasar RX J1131-1231 with NuSTAR, Swift, and XMM-Newton between 10 June 2016 and 30 November 2020. Comparing the amplitude of quasar microlensing variability at the hard and soft bands allows a size comparison, where larger sources lead to smaller microlensing variability. During the period between 6 June 2018 and 30 November 2020, where both the hard and soft light curves are available, the hard and soft bands varied by factors of 3.7 and 5.5, respectively, with rms variability of $0.40\pm0.05$ and $0.57\pm0.02$. Both the variability amplitude and rms are moderately smaller for the hard X-ray emission, indicating that the hard X-ray emission is moderately larger than the soft X-ray emission region. We found the reflection fraction from seven joint hard and soft X-ray monitoring epochs is effectively consistent with a constant with low significance variability. After decomposing the total X-ray flux into direct and reprocessed components, we find a smaller variability amplitude for the reprocessed flux compared to the direct emission. The power-law cutoff energy is constrained at 96$^{+47}_{-24}$ keV, which position the system in the allowable parameter space due to the pair production limit.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics
Authors:
Zhong** Lin,
Bhavin J. Shastri,
Shangxuan Yu,
**gxiang Song,
Yuntao Zhu,
Arman Safarnejadian,
Wangning Cai,
Yanmei Lin,
Wei Ke,
Mustafa Hammood,
Tianye Wang,
Mengyue Xu,
Zibo Zheng,
Mohammed Al-Qadasi,
Omid Esmaeeli,
Mohamed Rahim,
Grzegorz Pakulski,
Jens Schmid,
Pedro Barrios,
Weihong Jiang,
Hugh Morison,
Matthew Mitchell,
Xiaogang Qiang,
Xun Guan,
Nicolas A. F. Jaeger
, et al. (6 additional authors not shown)
Abstract:
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III…
▽ More
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, can implement an entire layer of a neural network. It can execute 65 billion operations per second (GOPS) per neuron, including simultaneous weight updates-a hitherto unachieved speed. Our processor stands out from conventional photonic processors, which have static weights set during training, as it supports fast "hardware-in-the-loop" training, and can dynamically adjust the inputs (fan-in) and outputs (fan-out) within a layer, thereby enhancing its versatility. Our processor can perform large-scale dot-product operations with vector dimensions up to 131,072. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112*112-pixel images after "hardware-in-the-loop" training. To handle "hardware-in-the-loop" training for clustering AI tasks, we provide a solution for multiplications involving two negative numbers based on our processor.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
Authors:
Melanie Mitchell,
Alessandro B. Palmarini,
Arseny Moskvichev
Abstract:
We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC ta…
▽ More
We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
△ Less
Submitted 11 December, 2023; v1 submitted 13 November, 2023;
originally announced November 2023.
-
LHAASO J2108+5157 as a Molecular Cloud Illuminated by a Supernova Remnant
Authors:
A. M. W. Mitchell
Abstract:
The search for Galactic PeVatrons - astrophysical accelerators of cosmic rays to PeV energies - has entered a new phase in recent years with the discovery of the first Ultra-High-Energy (UHE, $E>100$ TeV) gamma-ray sources by the HAWC and LHAASO experiments. Establishing whether the emission is leptonic or hadronic in nature, however, requires multiwavelength data and modelling studies. Among the…
▽ More
The search for Galactic PeVatrons - astrophysical accelerators of cosmic rays to PeV energies - has entered a new phase in recent years with the discovery of the first Ultra-High-Energy (UHE, $E>100$ TeV) gamma-ray sources by the HAWC and LHAASO experiments. Establishing whether the emission is leptonic or hadronic in nature, however, requires multiwavelength data and modelling studies. Among the currently known UHE sources, LHAASO J2108+5157 is an enigmatic source without clear association to a plausible accelerator, yet spatially coincident with molecular clouds. We investigate the scenario of a molecular cloud illuminated by cosmic rays accelerated in a nearby supernova remnant (SNR) as an explanation for LHAASO J2108+5157. We aim to constrain the required properties of the SNR as well as which of the clouds identified in the vicinity is the most likely association. We use a model for cosmic ray acceleration in SNRs, their transport through the interstellar medium and subsequent interaction with molecular material, to predict the corresponding gamma-ray emission. The parameter space of SNR properties is explored to find the most plausible parameter combination that can account for the gamma-ray spectrum of LHAASO J2108+5157. In the case that a SNR is illuminating the cloud, we find that it must be young ($<10$ kyr) and located within $40-60$ pc of the cloud. A SN scenario with a low Sedov time is preferred, with a maximum proton energy of 3 PeV assumed. No SNRs matching these properties are currently known, although an as yet undetected SNR remains feasible. The galactic CR sea is insufficient to solely account for the observed flux, such that a PeVatron accelerator must be present in the vicinity.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Uniaxial compression of 3D printed samples with voids: laboratory measurements compared with predictions from Effective Medium Theory
Authors:
Filip P. Adamus,
Ashley Stanton-Yonge,
Thomas M. Mitchell,
David Healy,
Philip G. Meredith
Abstract:
3D printing technology offers the possibility of producing synthetic samples with accurately defined microstructures. As indicated by effective medium theory (EMT), the shapes, orientations, and sizes of voids significantly affect the overall elastic response of a solid body. By performing uniaxial compression tests on twenty types of 3D-printed samples containing voids of different geometries, we…
▽ More
3D printing technology offers the possibility of producing synthetic samples with accurately defined microstructures. As indicated by effective medium theory (EMT), the shapes, orientations, and sizes of voids significantly affect the overall elastic response of a solid body. By performing uniaxial compression tests on twenty types of 3D-printed samples containing voids of different geometries, we examine whether the measured effective elasticities are accurately predicted by EMT. To manufacture the sample, we selected printers that use different technologies; fused deposition modelling (FDM), and stereolithography (SLA). We show how printer settings (FDM case) or sample cure time (SLA case) affect the measured properties. We also examine the reproducibility of elasticity tests on identically designed samples. To obtain the range of theoretical predictions, we assume either uniform strain or uniform stress. Our study of over two hundred samples shows that measured effective elastic moduli can fit EMT predictions with an error of less than 5% using both FDM and SLA methods if certain printing specifications and sample design considerations are taken into account. Notably, we find that the pore volume fraction of the designed samples should be above 1% to induce a measurable softening effect, but below 5% to produce accurate EMT estimations that fit the measured elastic properties of the samples. Our results highlight both the strengths of EMT for predicting the effective properties of solids with low pore fraction volume microstructural configurations, and the limitations for high porosity microstructures.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Hadronic Re-Acceleration at the Crab Pulsar Wind Termination Shock as a Source of PeV Gamma-Rays
Authors:
Samuel T. Spencer,
Alison M. W. Mitchell,
Brian Reville
Abstract:
Recent results from LHAASO and Tibet AS$γ$ suggest that the Crab Nebula's gamma-ray spectrum extends to the PeV energy range, however the production mechanisms of this highest energy emission remain unclear. It has been postulated that a secondary component of hadronic emission could explain the highest energy gamma-ray flux points, however the origin and acceleration mechanism for this hadronic p…
▽ More
Recent results from LHAASO and Tibet AS$γ$ suggest that the Crab Nebula's gamma-ray spectrum extends to the PeV energy range, however the production mechanisms of this highest energy emission remain unclear. It has been postulated that a secondary component of hadronic emission could explain the highest energy gamma-ray flux points, however the origin and acceleration mechanism for this hadronic population has yet to be explained. We postulate one scenario in which hadrons diffuse over time into the Crab pulsar wind nebula from the surrounding supernova ejecta, and are subsequently re-accelerated by the pulsar wind termination shock. We present results of direct particle transport simulations (including radial evolution) to determine if this scenario is viable over the lifetime of the Crab system.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Fiber-taper collected emission from NV centers in high-$Q/V$ diamond microdisks
Authors:
Tamiko Masuda,
J. P. E. Hadden,
David P. Lake,
Matthew Mitchell,
Sigurd Flågan,
Paul E. Barclay
Abstract:
Fiber-coupled microdisks are a promising platform for enhancing the spontaneous emission from color centers in diamond. The measured cavity-enhanced emission from the microdisk is governed by the effective volume ($V$) of each cavity mode, the cavity quality factor ($Q$), and the coupling between the microdisk and the fiber. Here we observe photoluminescence from an ensemble of nitrogen-vacancy ce…
▽ More
Fiber-coupled microdisks are a promising platform for enhancing the spontaneous emission from color centers in diamond. The measured cavity-enhanced emission from the microdisk is governed by the effective volume ($V$) of each cavity mode, the cavity quality factor ($Q$), and the coupling between the microdisk and the fiber. Here we observe photoluminescence from an ensemble of nitrogen-vacancy centers into high $Q/V$ microdisk modes, which when combined with coherent spectroscopy of the microdisk modes, allows us to elucidate the relative contributions of these factors. The broad emission spectrum acts as an internal light source facilitating mode identification over several cavity free spectral ranges. Analysis of the fiber-taper collected microdisk emission reveals spectral filtering both by the cavity and the fiber-taper, the latter of which we find preferentially couples to higher-order microdisk modes. Coherent mode spectroscopy is used to measure $Q\sim 1\times10^{5}$ -- the highest reported values for diamond microcavities operating at visible wavelengths. With realistic optimization of the microdisk dimensions, we predict that Purcell factors of $\sim 50$ are within reach.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
SmartPlay: A Benchmark for LLMs as Intelligent Agents
Authors:
Yue Wu,
Xuan Tang,
Tom M. Mitchell,
Yuanzhi Li
Abstract:
Recent large language models (LLMs) have demonstrated great potential toward intelligent agents and next-gen automation, but there currently lacks a systematic benchmark for evaluating LLMs' abilities as agents. We introduce SmartPlay: both a challenging benchmark and a methodology for evaluating LLMs as agents. SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi…
▽ More
Recent large language models (LLMs) have demonstrated great potential toward intelligent agents and next-gen automation, but there currently lacks a systematic benchmark for evaluating LLMs' abilities as agents. We introduce SmartPlay: both a challenging benchmark and a methodology for evaluating LLMs as agents. SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi, Minecraft. Each game features a unique setting, providing up to 20 evaluation settings and infinite environment variations. Each game in SmartPlay uniquely challenges a subset of 9 important capabilities of an intelligent LLM agent, including reasoning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness. The distinction between the set of capabilities each game test allows us to analyze each capability separately. SmartPlay serves not only as a rigorous testing ground for evaluating the overall performance of LLM agents but also as a road-map for identifying gaps in current methodologies. We release our benchmark at github.com/Microsoft/SmartPlay
△ Less
Submitted 17 March, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Joint H.E.S.S. and Fermi-LAT analysis of the region around PSR J1813-1749
Authors:
T. Wach,
A. M. W. Mitchell,
V. Joshi,
S. Funk
Abstract:
HESS J1813-178 is one of the brightest sources detected during the first HESS Galactic Plane survey. The compact source, also detected by MAGIC, is believed to be a pulsar wind nebula powered by one of the most powerful pulsars known in the Galaxy, PSR J1813-1749 with a spin-down luminosity of $\dot{\mathrm{E}} = 5.6 \cdot 10^{37}\,\mathrm{erg}\,\mathrm{s}^{-1}$. With its extreme physical properti…
▽ More
HESS J1813-178 is one of the brightest sources detected during the first HESS Galactic Plane survey. The compact source, also detected by MAGIC, is believed to be a pulsar wind nebula powered by one of the most powerful pulsars known in the Galaxy, PSR J1813-1749 with a spin-down luminosity of $\dot{\mathrm{E}} = 5.6 \cdot 10^{37}\,\mathrm{erg}\,\mathrm{s}^{-1}$. With its extreme physical properties, as well as the pulsar's young age of 5.6 kyrs, the $γ$-rays detected in this region allow us to study the evolution of a highly atypical system. Previous studies of the region in the GeV energy range show emission extended beyond the size of the compact H.E.S.S. source. Using the archival H.E.S.S. data with improved background methods, we perform a detailed morphological and spectral analysis of the region. Additionally to the compact, bright emission component, we find significantly extended emission, whose position is coincident with HESS J1813-178. We reanalyse the region in GeV and derive a joint-model in order to find a continuous description of the emission in the region from GeV to TeV. Using the results derived in this analysis, as well as X-ray and radio data of the region, we perform multi-wavelength spectral modeling. Possible hadronic or leptonic origins of the $γ$-ray emission are investigated, and the diffusion parameters necessary to explain the extended emission are examined.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Modelling of highly extended Gamma-ray emission around the Geminga Pulsar as detected with H.E.S.S
Authors:
A. M. W. Mitchell,
S. Caroff
Abstract:
Geminga is an enigmatic radio-quiet gamma-ray pulsar located at a mere 250 pc distance from Earth. Extended very-high-energy gamma-ray emission around the pulsar has been detected by multiple water Cherenkov detector based instruments. However, the detection of extended TeV gamma-ray emission around the Geminga pulsar has proven challenging for IACTs due to the angular scale exceeding the typical…
▽ More
Geminga is an enigmatic radio-quiet gamma-ray pulsar located at a mere 250 pc distance from Earth. Extended very-high-energy gamma-ray emission around the pulsar has been detected by multiple water Cherenkov detector based instruments. However, the detection of extended TeV gamma-ray emission around the Geminga pulsar has proven challenging for IACTs due to the angular scale exceeding the typical field-of-view. By detailed studies of background estimation techniques and characterising systematic effects, a detection of highly extended TeV gamma-ray emission could be confirmed by the H.E.S.S. IACT array. Building on the previously announced detection, in this contribution we further characterise the emission and apply an electron diffusion model to the combined gamma-ray data from the H.E.S.S. and HAWC experiments, as well as X-ray data from XMM-Newton.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Multiparameter quantum sensing and magnetic communications with a hybrid dc and rf optically pumped magnetometer
Authors:
Michał Lipka,
Aleksandra Sierant,
Charikleia Troullinou,
Morgan Mitchell
Abstract:
We introduce and demonstrate a hybrid optically pumped magnetometer (HOPM) that simultaneously measures one dc field component and one rf field component quadrature with a single atomic spin ensemble. The HOPM achieves sub-pT/$\sqrt{\mathrm{Hz}}$ sensitivity for both dc and rf fields, and is limited in sensitivity by spin projection noise at low frequencies and by photon shot noise at high frequen…
▽ More
We introduce and demonstrate a hybrid optically pumped magnetometer (HOPM) that simultaneously measures one dc field component and one rf field component quadrature with a single atomic spin ensemble. The HOPM achieves sub-pT/$\sqrt{\mathrm{Hz}}$ sensitivity for both dc and rf fields, and is limited in sensitivity by spin projection noise at low frequencies and by photon shot noise at high frequencies. We demonstrate with the HOPM a new application of multiparameter quantum sensing: background-cancelling spread spectrum magnetic communication. We encode a digital message as rf amplitude, spread among sixteen channels from \SI{29}{\kilo\hertz} to \SI{33}{\kilo\hertz} in a noisy magnetic environment, and observe quantum-noise-limited rf magnetic signal recovery enabled by quantum-noise-limited dc noise cancellation, reaching noise rejection of \SI{15}{\decibel} at \SI{100}{\hertz} and more than \SI{20}{\decibel} at \SI{60}{\hertz} and below. We measure signal fidelity versus signal strength and extrinsic noise in communication of a short text message. The combination of high sensitivity, quantum-noise-limited performance, and real-world application potential makes the HOPM ideally suited for study of high-performance multiparameter quantum sensing.
△ Less
Submitted 26 March, 2024; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Inter-species spin-noise correlations in hot atomic vapors
Authors:
K. Mouloudakis,
F. Vouzinas,
A. Margaritakis,
A. Koutsimpela,
G. Mouloudakis,
V. Koutrouli,
M. Skotiniotis,
G. P. Tsironis,
M. Loulakis,
M. W. Mitchell,
G. Vasilakis,
I. K. Kominis
Abstract:
We report an experimental and theoretical study of spin noise correlations in a $^{87}$Rb-$^{133}$Cs unpolarized alkali-metal vapor dominated by spin-exchange collisions. We observe strong unequal-time inter-species correlations and account for these with a first-principles theoretical model. Since the two atomic species have different spin precession frequencies, the dual-species vapor enables th…
▽ More
We report an experimental and theoretical study of spin noise correlations in a $^{87}$Rb-$^{133}$Cs unpolarized alkali-metal vapor dominated by spin-exchange collisions. We observe strong unequal-time inter-species correlations and account for these with a first-principles theoretical model. Since the two atomic species have different spin precession frequencies, the dual-species vapor enables the use of an additional experimental handle, the applied magnetic field, for untangling various sub-types of spin correlations. In particular, the measured cross-correlation and auto-correlation spectra shed light on a number of spin-dynamic effects involving intra-atom, inter-atom, intra-species and inter-species correlations. Cross-correlation coefficients exceeding $60\%$ have been observed at low-magnetic fields, where the two spin species couple strongly via spin-exchange collisions. The understanding of such spontaneously generated correlations can motivate the design of quantum-enhanced measurements with single or multi-species spin-polarized alkali-metal vapors used in quantum sensing applications.
△ Less
Submitted 15 October, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Quantum-enhanced magnetometry at optimal number density
Authors:
Charikleia Troullinou,
Vito Giovanni Lucivero,
Morgan W. Mitchell
Abstract:
We study the use of squeezed probe light and evasion of measurement back-action to enhance the sensitivity and measurement bandwidth of an optically-pumped magnetometer (OPM) at sensitivity-optimal atom number density. By experimental observation, and in agreement with quantum noise modeling, a spin-exchange-limited OPM probed with off-resonance laser light is shown to have an optimal sensitivity…
▽ More
We study the use of squeezed probe light and evasion of measurement back-action to enhance the sensitivity and measurement bandwidth of an optically-pumped magnetometer (OPM) at sensitivity-optimal atom number density. By experimental observation, and in agreement with quantum noise modeling, a spin-exchange-limited OPM probed with off-resonance laser light is shown to have an optimal sensitivity determined by density-dependent quantum noise contributions. Application of squeezed probe light boosts the OPM sensitivity beyond this laser-light optimum, allowing the OPM to achieve sensitivities that it cannot reach with coherent-state probing at any density. The observed quantum sensitivity enhancement at optimal number density is enabled by measurement back-action evasion.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Feedback Enhanced Phonon Lasing of a Microwave Frequency Resonator
Authors:
Peyman Parsa,
Prasoon Kumar Shandilya,
David P. Lake,
Matthew E. Mitchell,
Paul E. Barclay
Abstract:
The amplitude of self-oscillating mechanical resonators in cavity optomechanical systems is typically limited by nonlinearities arising from the cavity's finite optical bandwidth. We propose and demonstrate a feedback technique for increasing this limit. By modulating the cavity input field with a signal derived from its output intensity, we increase the amplitude of a self-oscillating GHz frequen…
▽ More
The amplitude of self-oscillating mechanical resonators in cavity optomechanical systems is typically limited by nonlinearities arising from the cavity's finite optical bandwidth. We propose and demonstrate a feedback technique for increasing this limit. By modulating the cavity input field with a signal derived from its output intensity, we increase the amplitude of a self-oscillating GHz frequency mechanical resonator by $22\%$ (increase in coherent phonon number of $50\%$) limited only by the achievable optomechanical cooperativity of the system. This technique will advance applications dependent on high dynamic mechanical stress, such as coherent spin-phonon coupling, as well as implementations of sensors based on self-oscillating resonators.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
The Influence of Satellite Trails on H.E.S.S. Gamma-Ray Astronomical Observations
Authors:
Samuel T. Spencer,
Thomas Lang,
Alison M. W. Mitchell
Abstract:
The number of satellites launched into low earth orbit has almost tripled (to over 4000) in the last three years due to the increasing commercialisation of space. Satellite constellations with a total of over 400,000 satellites are proposed to be launched in the near future. Many of these satellites are highly reflective, resulting in a high optical brightness that affects ground-based astronomica…
▽ More
The number of satellites launched into low earth orbit has almost tripled (to over 4000) in the last three years due to the increasing commercialisation of space. Satellite constellations with a total of over 400,000 satellites are proposed to be launched in the near future. Many of these satellites are highly reflective, resulting in a high optical brightness that affects ground-based astronomical observations across the electromagnetic spectrum. Despite this, the potential effect of these satellites on Imaging Atmospheric Cherenkov Telescopes (IACTs) has so far been assumed to be negligible due to their nanosecond integration times. This has, however, never been verified. We aim to identify satellite trails in data taken by the High Energy Stereoscopic System (H.E.S.S.) IACT array in Namibia, using Night Sky Background (NSB) data from the CT5 camera installed in 2019. We determine which observation times and pointing directions are affected the most, and evaluate the impact on Hillas parameters used for classification and reconstruction of high-energy Extensive Air Shower events. Finally, we predict how future planned satellite launches will affect gamma-ray observations with IACTs.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Anomalous noise spectra in a spin-exchange-relaxation-free alkali-metal vapor
Authors:
K. Mouloudakis,
J. Kong,
A. Sierant,
E. Arkin,
M. Hernández Ruiz,
R. Jiménez-Martínez,
M. W. Mitchell
Abstract:
We perform spin-noise spectroscopy on an unpolarized $^{87}\mathrm{Rb}$ vapor in the spin-exchange-relaxation-free (SERF) regime. We observe noise spectral distributions that deviate strongly from Lorentzian models that accurately describe lower-density regimes. For example, at magnetic fields of $\sim 1 \mathrm{μT}$ and $^{87}\mathrm{Rb}$ densities $\gtrsim 1 \times 10^{14} \rm{atoms/cm^{3}}$ we…
▽ More
We perform spin-noise spectroscopy on an unpolarized $^{87}\mathrm{Rb}$ vapor in the spin-exchange-relaxation-free (SERF) regime. We observe noise spectral distributions that deviate strongly from Lorentzian models that accurately describe lower-density regimes. For example, at magnetic fields of $\sim 1 \mathrm{μT}$ and $^{87}\mathrm{Rb}$ densities $\gtrsim 1 \times 10^{14} \rm{atoms/cm^{3}}$ we observe an asymmetric spin-noise distribution in which the resonance line is depleted by about half its power, with the diverted power becoming a broad spectral component that could be mistaken for optical shot noise. The results are in good agreement with recent models accounting for correlations between the ground hyperfine states. We discuss implications for quantum sensing and absolute noise calibration in spin-squeezing and entanglement detection.
△ Less
Submitted 28 March, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Photometry of Type II Supernova SN 2023ixf with a Worldwide Citizen Science Network
Authors:
Lauren A. Sgro,
Thomas M. Esposito,
Guillaume Blaclard,
Sebastian Gomez,
Franck Marchis,
Alexei V. Filippenko,
Daniel O'Conner Peluso,
Stephen S. Lawrence,
Aad Verveen,
Andreas Wagner,
Anouchka Nardi,
Barbara Wiart,
Benjamin Mirwald,
Bill Christensen,
Bob Eramia,
Bruce Parker,
Bruno Guillet,
Byungki Kim,
Chelsey A. Logan,
Christopher C. M. Kyba,
Christopher Toulmin,
Claudio G. Vantaggiato,
Dana Adhis,
Dave Gary,
Dave Goodey
, et al. (66 additional authors not shown)
Abstract:
We present highly sampled photometry of the supernova (SN) 2023ixf, a Type II SN in M101, beginning 2 days before its first known detection. To gather these data, we enlisted the global Unistellar Network of citizen scientists. These 252 observations from 115 telescopes show the SN's rising brightness associated with shock emergence followed by gradual decay. We measure a peak $M_{V}$ = -18.18…
▽ More
We present highly sampled photometry of the supernova (SN) 2023ixf, a Type II SN in M101, beginning 2 days before its first known detection. To gather these data, we enlisted the global Unistellar Network of citizen scientists. These 252 observations from 115 telescopes show the SN's rising brightness associated with shock emergence followed by gradual decay. We measure a peak $M_{V}$ = -18.18 $\pm$ 0.09 mag at 2023-05-25 21:37 UTC in agreement with previously published analyses.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Soft Air Pocket Force Sensors for Large Scale Flexible Robots
Authors:
Michael R. Mitchell,
Ciera McFarland,
Margaret M. Coad
Abstract:
Flexible robots have advantages over rigid robots in their ability to conform physically to their environment and to form a wide variety of shapes. Sensing the force applied by or to flexible robots is useful for both navigation and manipulation tasks, but it is challenging due to the need for the sensors to withstand the robots' shape change without encumbering their functionality. Also, for robo…
▽ More
Flexible robots have advantages over rigid robots in their ability to conform physically to their environment and to form a wide variety of shapes. Sensing the force applied by or to flexible robots is useful for both navigation and manipulation tasks, but it is challenging due to the need for the sensors to withstand the robots' shape change without encumbering their functionality. Also, for robots with long or large bodies, the number of sensors required to cover the entire surface area of the robot body can be prohibitive due to high cost and complexity. We present a novel soft air pocket force sensor that is highly flexible, lightweight, relatively inexpensive, and easily scalable to various sizes. Our sensor produces a change in internal pressure that is linear with the applied force. We present results of experimental testing of how uncontrollable factors (contact location and contact area) and controllable factors (initial internal pressure, thickness, size, and number of interior seals) affect the sensitivity. We demonstrate our sensor applied to a vine robot-a soft inflatable robot that "grows" from the tip via eversion-and we show that the robot can successfully grow and steer towards an object with which it senses contact.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Reinforcement Learning for Sequential Decoding of Generalized LDPC Codes
Authors:
Salman Habib,
David G. M. Mitchell
Abstract:
In this work, we propose reinforcement learning (RL) for sequential decoding of moderate length generalized low-density parity-check (GLDPC) codes. Here, sequential decoding refers to scheduling all the generalized constraint nodes (GCNs) and single parity-check nodes (SPCNs) of a GLDPC code serially in each iteration. A GLDPC decoding environment is modeled as a finite Markov decision process (MD…
▽ More
In this work, we propose reinforcement learning (RL) for sequential decoding of moderate length generalized low-density parity-check (GLDPC) codes. Here, sequential decoding refers to scheduling all the generalized constraint nodes (GCNs) and single parity-check nodes (SPCNs) of a GLDPC code serially in each iteration. A GLDPC decoding environment is modeled as a finite Markov decision process (MDP) in which the state-space comprises of all possible sequences of hard-decision values of the variables nodes (VNs) connected to the scheduled GCN or SPCN, and the action-space of the MDP consists of all possible actions (GCN and SPCN scheduling). The goal of RL is to determine an optimized scheduling policy, i.e., one that results in a decoded codeword by minimizing the complexity of the belief propagation (BP) decoder. For training, we consider the proportion of correct bits at the output of the GCN or SPCN as a reward once it is scheduled. The expected rewards for scheduling all the GCNs/SPCNs in the code's Tanner graph are earned via BP decoding during the RL phase. The proposed RL-based decoding scheme is shown to significantly outperform the standard BP flooding decoder, as well as a sequential decoder in which the GCNs/SPCNs are scheduled randomly.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Impact of Satellite Trails on H.E.S.S. Astronomical Observations
Authors:
Thomas Lang,
Samuel T. Spencer,
Alison M. W. Mitchell
Abstract:
The number of satellites launched into Earth's orbit has almost tripled in the last three years due to the increasing commercialisation of space. Multiple satellite constellations, consisting of over 400,000 individual satellites, have either been partially launched or are proposed for launch in the near future. Many of these satellites are highly reflective, resulting in a high optical brightness…
▽ More
The number of satellites launched into Earth's orbit has almost tripled in the last three years due to the increasing commercialisation of space. Multiple satellite constellations, consisting of over 400,000 individual satellites, have either been partially launched or are proposed for launch in the near future. Many of these satellites are highly reflective, resulting in a high optical brightness that affects ground-based astronomical observations. Despite this caveat, the potential effect of these satellites on gamma-ray-observing Imaging Atmospheric Cherenkov Telescopes (IACTs) has largely been assumed to be negligible due to their nanosecond-scale integration times. However, this assumption has not been verified to date. As IACTs are sensitive to optical wavelength light, we aim to identify satellite trails in data taken by the High Energy Stereoscopic System (H.E.S.S.) IACT array. In particular, this study is aimed at quantifying the potential effects on data quality and extensive air shower event classification and reconstruction. Using night sky background measurements from H.E.S.S., we determined which observation times and pointing directions are affected most by these satellite trails. We then evaluated their impact on the standard Hillas parameter variables used for event analysis. Due to the brightest trails, false trigger events can occur, however, for most modern analyses, the effect on astronomical results will be minimal. We observe a mild increase in the rate of trail detections over time, which is partially correlated with the number of satellite launches. Overall, the fraction of H.E.S.S. data affected is currently minimal. We note that these trails could still have a non-negligible effect on future Cherenkov Telescope Array observations if advanced analysis techniques designed to lower the energy threshold of the instrument are applied.
△ Less
Submitted 21 September, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Cosmic ray processes in galactic ecosystems
Authors:
Ellis R. Owen,
Kinwah Wu,
Yoshiyuki Inoue,
H. -Y. Karen Yang,
Alison M. W. Mitchell
Abstract:
Galaxy evolution is an important topic, and our physical understanding must be complete to establish a correct picture. This includes a thorough treatment of feedback. The effects of thermal-mechanical and radiative feedback have been widely considered, however cosmic rays (CRs) are also powerful energy carriers in galactic ecosystems. Resolving the capability of CRs to operate as a feedback agent…
▽ More
Galaxy evolution is an important topic, and our physical understanding must be complete to establish a correct picture. This includes a thorough treatment of feedback. The effects of thermal-mechanical and radiative feedback have been widely considered, however cosmic rays (CRs) are also powerful energy carriers in galactic ecosystems. Resolving the capability of CRs to operate as a feedback agent is therefore essential to advance our understanding of the processes regulating galaxies. The effects of CRs are yet to be fully understood, and their complex multi-channel feedback mechanisms operating across the hierarchy of galaxy structures pose a significant technical challenge. This review examines the role of CRs in galaxies, from the scale of molecular clouds to the circum-galactic medium. An overview of their interaction processes, their implications for galaxy evolution, and their observable signatures is provided and their capability to modify the thermal and hydrodynamic configuration of galactic ecosystems is discussed. We present recent advancements in our understanding of CR processes and interpretation of their signatures, and highlight where technical challenges and unresolved questions persist. We discuss how these may be addressed with upcoming opportunities.
△ Less
Submitted 12 July, 2023; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Authors:
Irene Solaiman,
Zeerak Talat,
William Agnew,
Lama Ahmad,
Dylan Baker,
Su Lin Blodgett,
Canyu Chen,
Hal Daumé III,
Jesse Dodge,
Isabella Duan,
Ellie Evans,
Felix Friedrich,
Avijit Ghosh,
Usman Gohar,
Sara Hooker,
Yacine Jernite,
Ria Kalluri,
Alberto Lusoli,
Alina Leidinger,
Michelle Lin,
Xiuzhu Lin,
Sasha Luccioni,
Jennifer Mickel,
Margaret Mitchell,
Jessica Newman
, et al. (6 additional authors not shown)
Abstract:
Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor…
▽ More
Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.
△ Less
Submitted 28 June, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Stronger Together: on the Articulation of Ethical Charters, Legal Tools, and Technical Documentation in ML
Authors:
Giada Pistilli,
Carlos Munoz Ferrandis,
Yacine Jernite,
Margaret Mitchell
Abstract:
The growing need for accountability of the people behind AI systems can be addressed by leveraging processes in three fields of study: ethics, law, and computer science. While these fields are often considered in isolation, they rely on complementary notions in their interpretation and implementation. In this work, we detail this interdependence and motivate the necessary role of collaborative gov…
▽ More
The growing need for accountability of the people behind AI systems can be addressed by leveraging processes in three fields of study: ethics, law, and computer science. While these fields are often considered in isolation, they rely on complementary notions in their interpretation and implementation. In this work, we detail this interdependence and motivate the necessary role of collaborative governance tools in sha** a positive evolution of AI. We first contrast notions of compliance in the ethical, legal, and technical fields; we outline both their differences and where they complement each other, with a particular focus on the roles of ethical charters, licenses, and technical documentation in these interactions. We then focus on the role of values in articulating the synergies between the fields and outline specific mechanisms of interaction between them in practice. We identify how these mechanisms have played out in several open governance fora: an open collaborative workshop, a responsible licensing initiative, and a proposed regulatory framework. By leveraging complementary notions of compliance in these three domains, we can create a more comprehensive framework for governing AI systems that jointly takes into account their technical capabilities, their impact on society, and how technical specifications can inform relevant regulations. Our analysis thus underlines the necessity of joint consideration of the ethical, legal, and technical in AI ethics frameworks to be used on a larger scale to govern AI systems and how the thinking in each of these areas can inform the others.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
Authors:
Arseny Moskvichev,
Victor Vikram Odouard,
Melanie Mitchell
Abstract:
The abilities to form and abstract concepts is key to human intelligence, but such abilities remain lacking in state-of-the-art AI systems. There has been substantial research on conceptual abstraction in AI, particularly using idealized domains such as Raven's Progressive Matrices and Bongard problems, but even when AI systems succeed on such problems, the systems are rarely evaluated in depth to…
▽ More
The abilities to form and abstract concepts is key to human intelligence, but such abilities remain lacking in state-of-the-art AI systems. There has been substantial research on conceptual abstraction in AI, particularly using idealized domains such as Raven's Progressive Matrices and Bongard problems, but even when AI systems succeed on such problems, the systems are rarely evaluated in depth to see if they have actually grasped the concepts they are meant to capture.
In this paper we describe an in-depth evaluation benchmark for the Abstraction and Reasoning Corpus (ARC), a collection of few-shot abstraction and analogy problems developed by Chollet [2019]. In particular, we describe ConceptARC, a new, publicly available benchmark in the ARC domain that systematically assesses abstraction and generalization abilities on a number of basic spatial and semantic concepts. ConceptARC differs from the original ARC dataset in that it is specifically organized around "concept groups" -- sets of problems that focus on specific concepts and that are vary in complexity and level of abstraction. We report results on testing humans on this benchmark as well as three machine solvers: the top two programs from a 2021 ARC competition and OpenAI's GPT-4. Our results show that humans substantially outperform the machine solvers on this benchmark, showing abilities to abstract and generalize concepts that are not yet captured by AI systems. We believe that this benchmark will spur improvements in the development of AI systems for conceptual abstraction and in the effective evaluation of such systems.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
The Roles of Symbols in Neural-based AI: They are Not What You Think!
Authors:
Daniel L. Silver,
Tom M. Mitchell
Abstract:
We propose that symbols are first and foremost external communication tools used between intelligent agents that allow knowledge to be transferred in a more efficient and effective manner than having to experience the world directly. But, they are also used internally within an agent through a form of self-communication to help formulate, describe and justify subsymbolic patterns of neural activit…
▽ More
We propose that symbols are first and foremost external communication tools used between intelligent agents that allow knowledge to be transferred in a more efficient and effective manner than having to experience the world directly. But, they are also used internally within an agent through a form of self-communication to help formulate, describe and justify subsymbolic patterns of neural activity that truly implement thinking. Symbols, and our languages that make use of them, not only allow us to explain our thinking to others and ourselves, but also provide beneficial constraints (inductive bias) on learning about the world. In this paper we present relevant insights from neuroscience and cognitive science, about how the human brain represents symbols and the concepts they refer to, and how today's artificial neural networks can do the same. We then present a novel neuro-symbolic hypothesis and a plausible architecture for intelligent agents that combines subsymbolic representations for symbols and concepts for learning and reasoning. Our hypothesis and associated architecture imply that symbols will remain critical to the future of intelligent systems NOT because they are the fundamental building blocks of thought, but because they are characterizations of subsymbolic processes that constitute thought.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Can AI Put Gamma-Ray Astrophysicists Out of a Job?
Authors:
Samuel T. Spencer,
Vikas Joshi,
Alison M. W. Mitchell
Abstract:
In what will likely be a litany of generative-model-themed arXiv submissions celebrating April the 1st, we evaluate the capacity of state-of-the-art transformer models to create a paper detailing the detection of a Pulsar Wind Nebula with a non-existent Imaging Atmospheric Cherenkov Telescope (IACT) Array. We do this to evaluate the ability of such models to interpret astronomical observations and…
▽ More
In what will likely be a litany of generative-model-themed arXiv submissions celebrating April the 1st, we evaluate the capacity of state-of-the-art transformer models to create a paper detailing the detection of a Pulsar Wind Nebula with a non-existent Imaging Atmospheric Cherenkov Telescope (IACT) Array. We do this to evaluate the ability of such models to interpret astronomical observations and sources based on language information alone, and to assess potential means by which fraudulently generated scientific papers could be identified during peer review (given that reliable generative model watermarking has yet to be deployed for these tools). We conclude that our jobs as astronomers are safe for the time being. From this point on, prompts given to ChatGPT and Stable Diffusion are shown in orange, text generated by ChatGPT is shown in black, whereas analysis by the (human) authors is in blue.
△ Less
Submitted 4 April, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
Impact of cross-section uncertainties on supernova neutrino spectral parameter fitting in the Deep Underground Neutrino Experiment
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
Z. Ahmad,
J. Ahmed,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1294 additional authors not shown)
Abstract:
A primary goal of the upcoming Deep Underground Neutrino Experiment (DUNE) is to measure the $\mathcal{O}(10)$ MeV neutrinos produced by a Galactic core-collapse supernova if one should occur during the lifetime of the experiment. The liquid-argon-based detectors planned for DUNE are expected to be uniquely sensitive to the $ν_e$ component of the supernova flux, enabling a wide variety of physics…
▽ More
A primary goal of the upcoming Deep Underground Neutrino Experiment (DUNE) is to measure the $\mathcal{O}(10)$ MeV neutrinos produced by a Galactic core-collapse supernova if one should occur during the lifetime of the experiment. The liquid-argon-based detectors planned for DUNE are expected to be uniquely sensitive to the $ν_e$ component of the supernova flux, enabling a wide variety of physics and astrophysics measurements. A key requirement for a correct interpretation of these measurements is a good understanding of the energy-dependent total cross section $σ(E_ν)$ for charged-current $ν_e$ absorption on argon. In the context of a simulated extraction of supernova $ν_e$ spectral parameters from a toy analysis, we investigate the impact of $σ(E_ν)$ modeling uncertainties on DUNE's supernova neutrino physics sensitivity for the first time. We find that the currently large theoretical uncertainties on $σ(E_ν)$ must be substantially reduced before the $ν_e$ flux parameters can be extracted reliably: in the absence of external constraints, a measurement of the integrated neutrino luminosity with less than 10\% bias with DUNE requires $σ(E_ν)$ to be known to about 5%. The neutrino spectral shape parameters can be known to better than 10% for a 20% uncertainty on the cross-section scale, although they will be sensitive to uncertainties on the shape of $σ(E_ν)$. A direct measurement of low-energy $ν_e$-argon scattering would be invaluable for improving the theoretical precision to the needed level.
△ Less
Submitted 7 July, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Stable Bias: Analyzing Societal Representations in Diffusion Models
Authors:
Alexandra Sasha Luccioni,
Christopher Akiki,
Margaret Mitchell,
Yacine Jernite
Abstract:
As machine learning-enabled Text-to-Image (TTI) systems are becoming increasingly prevalent and seeing growing adoption as commercial services, characterizing the social biases they exhibit is a necessary first step to lowering their risk of discriminatory outcomes. This evaluation, however, is made more difficult by the synthetic nature of these systems' outputs: common definitions of diversity a…
▽ More
As machine learning-enabled Text-to-Image (TTI) systems are becoming increasingly prevalent and seeing growing adoption as commercial services, characterizing the social biases they exhibit is a necessary first step to lowering their risk of discriminatory outcomes. This evaluation, however, is made more difficult by the synthetic nature of these systems' outputs: common definitions of diversity are grounded in social categories of people living in the world, whereas the artificial depictions of fictive humans created by these systems have no inherent gender or ethnicity. To address this need, we propose a new method for exploring the social biases in TTI systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts, and comparing it to the variation engendered by spanning different professions. This allows us to (1) identify specific bias trends, (2) provide targeted scores to directly compare models in terms of diversity and representation, and (3) jointly model interdependent social variables to support a multidimensional analysis. We leverage this method to analyze images generated by 3 popular TTI systems (Dall-E 2, Stable Diffusion v 1.4 and 2) and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents. We also release the datasets and low-code interactive bias exploration platforms developed for this work, as well as the necessary tools to similarly evaluate additional TTI systems.
△ Less
Submitted 9 November, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.