Search | arXiv e-print repository

Exact solitary wave solutions for a coupled gKdV-Schrodinger system by a new ODE reduction method

Authors: Stephen C. Anco, James Hornick, Sicheng Zhao, Thomas Wolf

Abstract: A new method is developed for finding exact solitary wave solutions of a generalized Korteweg-de Vries equation with p-power nonlinearity coupled to a linear Schrödinger equation arising in many different physical applications. This method yields 22 solution families, with p=1,2,3,4. No solutions for p>1 were known previously in the literature. For p=1, four of the solution families contain bright… ▽ More A new method is developed for finding exact solitary wave solutions of a generalized Korteweg-de Vries equation with p-power nonlinearity coupled to a linear Schrödinger equation arising in many different physical applications. This method yields 22 solution families, with p=1,2,3,4. No solutions for p>1 were known previously in the literature. For p=1, four of the solution families contain bright/dark Davydov solitons of the 1st and 2nd kind, obtained in recent work by basic ansatze applied to the ODE system for travelling waves. All of the new solution families have interesting features, including bright/dark peaks with (up to) p symmetric pairs of side peaks in the amplitude and a kink profile for the nonlinear part in the phase. The present method is fully systematic and involves several novel steps which reduce the travelling wave ODE system to a single nonlinear base ODE for which all polynomial solutions are found by symbolic computation. It is applicable more generally to other coupled nonlinear dispersive wave equations as well as to nonlinear ODE systems of generalized Hénon-Heiles form. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 45 pages

arXiv:2406.17557 [pdf, other]

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Authors: Guilherme Penedo, Hynek Kydlíček, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell, Colin Raffel, Leandro Von Werra, Thomas Wolf

Abstract: The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produ… ▽ More The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including in-depth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb. LLMs pretrained on FineWeb-Edu exhibit dramatically better performance on knowledge- and reasoning-intensive benchmarks like MMLU and ARC. Along with our datasets, we publicly release our data curation codebase and all of the models trained during our ablation experiments. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.13083 [pdf, other]

Design and Performance of a Magnetic Bottle Electron Spectrometer for High-Energy Photoelectron Spectroscopy

Authors: Kurtis Borne, Jordan T ONeal, Jun Wang, Erk Isele, Razib Obaid, Nora Berrah, Xinxin Cheng, Philip H Bucksbaum, Justin James, Andri Kamalov, Kirk A Larsen, Xiang Li, Ming-Fu Lin, Yusong Liu, Agostino Marinelli, Adam Summers, Emily Thierstein, Thomas Wolf, Daniel Rolles, Peter Walter, James P Cryan, Taran Driver

Abstract: We describe the design and performance of a magnetic bottle electron spectrometer~(MBES) for high-energy electron spectroscopy. Our design features a ${\sim2}$~m long electron drift tube and electrostatic retardation lens, achieving sub-electronvolt (eV) electron kinetic energy resolution for high energy (several hundred eV) electrons with close to 4$π$ collection efficiency. A segmented anode… ▽ More We describe the design and performance of a magnetic bottle electron spectrometer~(MBES) for high-energy electron spectroscopy. Our design features a ${\sim2}$~m long electron drift tube and electrostatic retardation lens, achieving sub-electronvolt (eV) electron kinetic energy resolution for high energy (several hundred eV) electrons with close to 4$π$ collection efficiency. A segmented anode electron detector enables the simultaneous collection of photoelectron spectra in high resolution and high collection efficiency modes. This versatile instrument is installed at the TMO endstation at the LCLS x-ray free-electron laser (XFEL). In this paper, we demonstrate its high resolution, collection efficiency and spatial selectivity in measurements where it is coupled to an XFEL source. These combined characteristics are designed to enable high-resolution time-resolved measurements using x-ray photoelectron, absorption, and Auger-Meitner spectroscopy. We also describe the pervasive artifact in MBES time-of-flight spectra that arises from a periodic modulation in electron detection efficiency, and present a robust analysis procedure for its removal. △ Less

Submitted 4 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10709 [pdf, other]

Intraband collective excitations in fractional Chern insulators are dark

Authors: Tobias M. R. Wolf, Yung-Chun Chao, Allan H. MacDonald, Jung Jung Su

Abstract: The low-energy collective excitations of semiconductors and insulators often couple strongly to light, allowing them to be probed optically. We argue here that in fractional Chern insulators intra-band collective excitations are dark in the sense that they couple anomalously weakly to light. This conclusion is based on a relationship between ideal quantum geometry and the structure factor of a Che… ▽ More The low-energy collective excitations of semiconductors and insulators often couple strongly to light, allowing them to be probed optically. We argue here that in fractional Chern insulators intra-band collective excitations are dark in the sense that they couple anomalously weakly to light. This conclusion is based on a relationship between ideal quantum geometry and the structure factor of a Chern band, and on a classical plasma analogy motivated by the vortexibility property of ideal Chern bands. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Main text: 5 pages, 3 figures, Supmat: 5 pages, 1 figure; Comments are welcome

arXiv:2406.09528 [pdf, other]

JWST/NIRCam 4-5 $μ$m Imaging of the Giant Planet AF Lep b

Authors: Kyle Franson, William O. Balmer, Brendan P. Bowler, Laurent Pueyo, Yifan Zhou, Emily Rickman, Zhoujian Zhang, Sagnick Mukherjee, Tim D. Pearce, Daniella C. Bardalez Gagliuffi, Lauren I. Biddle, Timothy D. Brandt, Rachel Bowens-Rubin, Justin R. Crepp, James W. Davidson, Jr., Jacqueline Faherty, Christian Ginski, Elliott P. Horch, Marvin Morgan, Caroline V. Morley, Marshall D. Perrin, Aniket Sanghi, Maissa Salama, Christopher A. Theissen, Quang H. Tran , et al. (1 additional authors not shown)

Abstract: With a dynamical mass of $3 \, M_\mathrm{Jup}$, the recently discovered giant planet AF Lep b is the lowest-mass imaged planet with a direct mass measurement. Its youth and spectral type near the L/T transition make it a promising target to study the impact of clouds and atmospheric chemistry at low surface gravities. In this work, we present JWST/NIRCam imaging of AF Lep b. Across two epochs, we… ▽ More With a dynamical mass of $3 \, M_\mathrm{Jup}$, the recently discovered giant planet AF Lep b is the lowest-mass imaged planet with a direct mass measurement. Its youth and spectral type near the L/T transition make it a promising target to study the impact of clouds and atmospheric chemistry at low surface gravities. In this work, we present JWST/NIRCam imaging of AF Lep b. Across two epochs, we detect AF Lep b in F444W ($4.4 \, \mathrm{μm}$) with S/N ratios of 9.6 and 8.7, respectively. At the planet's separation of $320 \, \mathrm{mas}$ during the observations, the coronagraphic throughput is ${\approx}7\%$, demonstrating that NIRCam's excellent sensitivity persists down to small separations. The F444W photometry of AF Lep b affirms the presence of disequilibrium carbon chemistry and enhanced atmospheric metallicity. These observations also place deep limits on wider-separation planets in the system, ruling out $1.1 \, M_\mathrm{Jup}$ planets beyond $15.6 \, \mathrm{au}$ (0.58 arcsec), $1.1 \, M_\mathrm{Sat}$ planets beyond $27 \, \mathrm{au}$ (1 arcsec), and $2.8 \, M_\mathrm{Nep}$ planets beyond $67 \, \mathrm{au}$ (2.5 arcsec). We also present new Keck/NIRC2 $L'$ imaging of AF Lep b; combining this with the two epochs of F444W photometry and previous Keck $L'$ photometry provides limits on the long-term 3-$5 \, \mathrm{μm}$ variability of AF Lep b on months-to-years timescales. AF Lep b is the closest-separation planet imaged with JWST to date, demonstrating that planets can be recovered well inside the nominal (50% throughput) NIRCam coronagraph inner working angle. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 17 pages, 4 figures, submitted to ApJL

arXiv:2405.14693 [pdf, ps, other]

doi 10.3847/PSJ/ad50a7

Interpolation and synthesis of sparse samples in exoplanet atmospheric modeling

Authors: Jacob Haqq-Misra, Eric T. Wolf, Thomas J. Fauchez, Ravi K. Kopparapu

Abstract: This paper highlights methods from geostatistics that are relevant to the interpretation, intercomparison, and synthesis of atmospheric model data, with a specific application to exoplanet atmospheric modeling. Climate models are increasingly used to study theoretical and observational properties of exoplanets, which include a hierarchy of models ranging from fast and idealized models to those tha… ▽ More This paper highlights methods from geostatistics that are relevant to the interpretation, intercomparison, and synthesis of atmospheric model data, with a specific application to exoplanet atmospheric modeling. Climate models are increasingly used to study theoretical and observational properties of exoplanets, which include a hierarchy of models ranging from fast and idealized models to those that are slower but more comprehensive. Exploring large parameter spaces with computationally-expensive models can be accomplished with sparse sampling techniques, but analyzing such sparse samples can pose challenges for conventional interpolation functions. Ordinary kriging is a statistical method for describing the spatial distribution of a data set in terms of the variogram function, which can be used to interpolate sparse samples across any number of dimensions. Variograms themselves may also be useful diagnostic tools for describing the spatial distribution of model data in exoplanet atmospheric model intercomparison projects. Universal kriging is another method that can synthesize data calculated by models of different complexity, which can be used to combine sparse samples of data from slow models with larger samples of data from fast models. Ordinary and universal kriging can also provide a way to synthesize model predictions with sparse samples of exoplanet observations and may have other applications in exoplanet science. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted for publication in the Planetary Science Journal

Journal ref: PSJ (2024) 5: 140

arXiv:2405.08074 [pdf]

Optical Imaging of Flavor Order in Flat Band Graphene

Authors: Tian Xie, Tobias M. Wolf, Siyuan Xu, Zhiyuan Cui, Richen Xiong, Yunbo Ou, Patrick Hays, Ludwig F Holleis, Yi Guo, Owen I Sheekey, Caitlin Patterson, Trevor Arp, Kenji Watanabe, Takashi Taniguchi, Seth Ariel Tongay, Andrea F Young, Allan H. MacDonald, Chenhao **

Abstract: Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transit… ▽ More Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transition metal dichalcogenide layer. Through a systematic study of rhombohedral and rotationally faulted graphene bilayers and trilayers, we show that when the semiconducting dichalcogenide is in direct contact with the graphene, the exciton response is most sensitive to the large momentum rearrangement of the Fermi surface, providing information that is distinct from and complementary to electrical compressibility measurements. The wide-field imaging capability of optical probes allows us to obtain spatial maps of flavor orders with high throughput, and with broad temperature and device compatibility. Our work paves the way for optical probing and imaging of flavor orders in flat band graphene systems. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 29 pages, 4 figures, with supplementary materials

arXiv:2405.03570 [pdf]

doi 10.3847/1538-4357/ad3242

Impact of Planetary Parameters on Water Clouds Microphysics

Authors: Huanzhou Yang, Thaddeus D. Komacek, Owen B. Toon, Eric T. Wolf, Tyler D. Robinson, Caroline Chael, Dorian S. Abbot

Abstract: Potentially habitable exoplanets are targets of great interest for the James Webb Space Telescope and upcoming mission concepts such as the Habitable Worlds Observatory. Clouds strongly affect climate and habitability, but predicting their properties is difficult. In Global Climate Models (GCMs), especially those aiming at simulating Earth, cloud microphysics is often crudely approximated by assum… ▽ More Potentially habitable exoplanets are targets of great interest for the James Webb Space Telescope and upcoming mission concepts such as the Habitable Worlds Observatory. Clouds strongly affect climate and habitability, but predicting their properties is difficult. In Global Climate Models (GCMs), especially those aiming at simulating Earth, cloud microphysics is often crudely approximated by assuming that all cloud particles have a single, constant size or a prescribed size distribution and that all clouds in a grid cell are identical. For exoplanets that range over a large phase space of planetary properties, this method could result in large errors. In this work, our goal is to determine how cloud microphysics on terrestrial exoplanets, whose condensable is mainly water vapor, depend on aerosol properties and planetary parameters such as surface pressure, surface gravity, and incident stellar radiation. We use the Community Aerosol and Radiation Model for Atmospheres as a 1D microphysical model to simulate the formation and evolution of clouds including the processes of nucleation, condensation, evaporation, coagulation, and vertical transfer. In these 1D idealized experiments, we find that the parameters that determine the macrophysical thermal structure of the atmospheres, including surface pressure and stellar flux, impact cloud radiative effect (CRE) most significantly. Parameters such as gravity and number density of aerosols working as cloud condensation nuclei affect the microphysical processes of cloud formation, including activation and vertical transfer. They also have a significant, though weaker effect on CRE. This work motivates the development of more accurate GCM cloud schemes and should aid in the interpretation of future observations. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 17 pages, 10 figures

Journal ref: Huanzhou Yang et al 2024 ApJ 966 152

arXiv:2405.03438 [pdf, other]

Anomalous Nernst effect in the noncollinear antiferromagnet Mn$_5$Si$_3$

Authors: Christoph Sürgers, Gerda Fischer, Warlley H. Campos, Anna Birk Hellenes, Libor Šmejkal, Jairo Sinova, Michael Merz, Thomas Wolf, Wolfgang Wernsdorfer

Abstract: Investigating the off-diagonal components of the conductivity and thermoelectric tensor of materials hosting complex antiferromagnetic structures has become a viable method to reveal the effects of topology and chirality on the electronic transport in these systems. In this respect, Mn$_5$Si$_3$ is an interesting metallic compound that exhibits several antiferromagnetic phases below 100 K with dif… ▽ More Investigating the off-diagonal components of the conductivity and thermoelectric tensor of materials hosting complex antiferromagnetic structures has become a viable method to reveal the effects of topology and chirality on the electronic transport in these systems. In this respect, Mn$_5$Si$_3$ is an interesting metallic compound that exhibits several antiferromagnetic phases below 100 K with different collinear and noncollinear arrangements of Mn magnetic moments. Previous investigations have shown that the transitions between the various phases give rise to large changes of the anomalous Hall effect. Here, we report measurements of the anomalous Nernst effect of Mn$_5$Si$_3$ single crystals. Below 25 K we observe a sign change of the zero-field Nernst signal with a concomitant decrease of the Hall signal and a gradual reduction of the remanent magnetization which we attribute to a subtle rearrangement of the magnetic moment configuration at low temperatures. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures

arXiv:2404.06253 [pdf, other]

From Barlow Twins to Triplet Training: Differentiating Dementia with Limited Data

Authors: Yitong Li, Tom Nuno Wolf, Sebastian Pölsterl, Igor Yakushev, Dennis M. Hedderich, Christian Wachinger

Abstract: Differential diagnosis of dementia is challenging due to overlap** symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house dataset… ▽ More Differential diagnosis of dementia is challenging due to overlap** symptoms, with structural magnetic resonance imaging (MRI) being the primary method for diagnosis. Despite the clinical value of computer-aided differential diagnosis, research has been limited, mainly due to the absence of public datasets that contain diverse types of dementia. This leaves researchers with small in-house datasets that are insufficient for training deep neural networks (DNNs). Self-supervised learning shows promise for utilizing unlabeled MRI scans in training, but small batch sizes for volumetric brain scans make its application challenging. To address these issues, we propose Triplet Training for differential diagnosis with limited target data. It consists of three key stages: (i) self-supervised pre-training on unlabeled data with Barlow Twins, (ii) self-distillation on task-related data, and (iii) fine-tuning on the target dataset. Our approach significantly outperforms traditional training strategies, achieving a balanced accuracy of 75.6%. We further provide insights into the training process by visualizing changes in the latent space after each step. Finally, we validate the robustness of Triplet Training in terms of its individual components in a comprehensive ablation study. Our code is available at https://github.com/ai-med/TripletTraining. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted for presentation at MIDL 2024

arXiv:2403.14878 [pdf, other]

Offline tagging of radon-induced backgrounds in XENON1T and applicability to other liquid xenon detectors

Authors: E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, G. Bruno, R. Budnik, T. K. Bui, J. M. R. Cardoso, A. P. Cimental Chavez, A. P. Colijn, J. Conrad , et al. (142 additional authors not shown)

Abstract: This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity… ▽ More This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity field, $^{214}\text{Pb}$ background events can be tagged when they are followed by $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays, or preceded by $^{218}\text{Po}$ decays. This was achieved by evolving a point cloud in the direction of a measured convection velocity field, and searching for $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays or $^{218}\text{Po}$ decays within a volume defined by the point cloud. In XENON1T, this tagging system achieved a $^{214}\text{Pb}$ background reduction of $6.2^{+0.4}_{-0.9}\%$ with an exposure loss of $1.8\pm 0.2 \%$, despite the timescales of convection being smaller than the relevant decay times. We show that the performance can be improved in XENONnT, and that the performance of such a software-tagging approach can be expected to be further improved in a diffusion-limited scenario. Finally, a similar method might be useful to tag the cosmogenic $^{137}\text{Xe}$ background, which is relevant to the search for neutrinoless double-beta decay. △ Less

Submitted 19 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 17 pages, 19 figures

arXiv:2403.01045 [pdf, other]

Unexpected hydrogen dissociation in thymine: predictions from a novel coupled cluster theory

Authors: Eirik F. Kjønstad, O. Jonathan Fajen, Alexander C. Paul, Sara Angelico, Dennis Mayer, Markus Gühr, Thomas J. A. Wolf, Todd J. Martínez, Henrik Koch

Abstract: The fate of thymine upon excitation by ultraviolet radiation has been the subject of intense debate over the past three decades. Today, it is widely believed that its ultrafast excited state decay stems from a radiationless transition from the bright $ππ^*$ state to a dark $nπ^*$ state. However, conflicting theoretical predictions have made the experimental data difficult to interpret. Here we sim… ▽ More The fate of thymine upon excitation by ultraviolet radiation has been the subject of intense debate over the past three decades. Today, it is widely believed that its ultrafast excited state decay stems from a radiationless transition from the bright $ππ^*$ state to a dark $nπ^*$ state. However, conflicting theoretical predictions have made the experimental data difficult to interpret. Here we simulate the ultrafast dynamics in thymine at the highest level of theory to date, performing wavepacket dynamics with a new coupled cluster method. Our simulation confirms an ultrafast $ππ^*$ to $nπ^*$ transition ($τ = 41 \pm 14$ fs). Furthermore, the predicted oxygen-edge X-ray absorption spectra agree quantitatively with the experimental results. Our simulation also predicts an as-yet uncharacterized photochemical pathway: a $πσ^*$ channel that leads to hydrogen dissociation at one of the two N-H bonds in thymine. Similar behavior has been identified in other heteroaromatic compounds, including adenine, and several authors have speculated that a similar pathway may exist in thymine. However, this was never confirmed theoretically or experimentally. This prediction calls for renewed efforts to experimentally identify or exclude the presence of this channel. △ Less

Submitted 7 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 42 pages, 23 figures

arXiv:2402.19173 [pdf, other]

StarCoder 2 and The Stack v2: The Next Generation

Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.17685 [pdf, other]

Attosecond X-ray Chronoscopy of Core-level Photoemission

Authors: Jia-Bao Ji, Zhaoheng Guo, Taran Driver, Cynthia S. Trevisan, David Cesar, Xinxin Cheng, Joseph Duris, Paris L. Franz, James Glownia, Xiaochun Gong, Daniel Hammerland, Meng Han, Saijoscha Heck, Matthias Hoffmann, Andrei Kamalov, Kirk A. Larsen, Xiang Li, Ming-Fu Lin, Yuchen Liu, C. William McCurdy, Razib Obaid, Jordan T. ONeal, Thomas N. Rescigno, River R. Robles, Nicholas Sudar , et al. (10 additional authors not shown)

Abstract: Attosecond photoemission or photoionization delays are a unique probe of the structure and the electronic dynamics of matter. However, spectral congestion and spatial delocalization of valence electron wave functions set fundamental limits to the complexity of systems that can be studied and the information that can be retrieved, respectively. Using attosecond X-ray pulses from LCLS, we demonstrat… ▽ More Attosecond photoemission or photoionization delays are a unique probe of the structure and the electronic dynamics of matter. However, spectral congestion and spatial delocalization of valence electron wave functions set fundamental limits to the complexity of systems that can be studied and the information that can be retrieved, respectively. Using attosecond X-ray pulses from LCLS, we demonstrate the key advantages of measuring core-level delays: the photoelectron spectra remain atom-like, the measurements become element specific and the observed scattering dynamics originate from a point-like source. We exploit these unique features to reveal the effects of electronegativity and symmetry on attosecond scattering dynamics by measuring the photoionization delays between N-1s and C-1s core shells of a series of aromatic azabenzene molecules. Remarkably, the delays systematically increase with the number of nitrogen atoms in the molecule and reveal multiple resonances. We identify two previously unknown mechanisms regulating the associated attosecond dynamics, namely the enhanced confinement of the trapped wavefunction with increasing electronegativity of the atoms and the decrease of the coupling strength among the photoemitted partial waves with increasing symmetry. This study demonstrates the unique opportunities opened by measurements of core-level photoionization delays for unravelling attosecond electron dynamics in complex matter. △ Less

Submitted 8 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.12764 [pdf, other]

Attosecond Delays in X-ray Molecular Ionization

Authors: Taran Driver, Miles Mountney, Jun Wang, Lisa Ortmann, Andre Al-Haddad, Nora Berrah, Christoph Bostedt, Elio G. Champenois, Louis F. DiMauro, Joseph Duris, Douglas Garratt, James M. Glownia, Zhaoheng Guo, Daniel Haxton, Erik Isele, Igor Ivanov, Jiabao Ji, Andrei Kamalov, Siqi Li, Ming-Fu Lin, Jon P. Marangos, Razib Obaid, Jordan T. O'Neal, Philipp Rosenberger, Niranjan H. Shivaram , et al. (12 additional authors not shown)

Abstract: The photoelectric effect is not truly instantaneous, but exhibits attosecond delays that can reveal complex molecular dynamics. Sub-femtosecond duration light pulses provide the requisite tools to resolve the dynamics of photoionization. Accordingly, the past decade has produced a large volume of work on photoionization delays following single photon absorption of an extreme ultraviolet (XUV) phot… ▽ More The photoelectric effect is not truly instantaneous, but exhibits attosecond delays that can reveal complex molecular dynamics. Sub-femtosecond duration light pulses provide the requisite tools to resolve the dynamics of photoionization. Accordingly, the past decade has produced a large volume of work on photoionization delays following single photon absorption of an extreme ultraviolet (XUV) photon. However, the measurement of time-resolved core-level photoionization remained out of reach. The required x-ray photon energies needed for core-level photoionization were not available with attosecond tabletop sources. We have now measured the x-ray photoemission delay of core-level electrons, and here report unexpectedly large delays, ranging up to 700 attoseconds in NO near the oxygen K-shell threshold. These measurements exploit attosecond soft x-ray pulses from a free-electron laser (XFEL) to scan across the entire region near the K-shell threshold. Furthermore, we find the delay spectrum is richly modulated, suggesting several contributions including transient trap** of the photoelectron due to shape resonances, collisions with the Auger-Meitner electron that is emitted in the rapid non-radiative relaxation of the molecule, and multi-electron scattering effects. The results demonstrate how x-ray attosecond experiments, supported by comprehensive theoretical modelling, can unravel the complex correlated dynamics of core-level photoionization. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10446 [pdf, other]

The XENONnT Dark Matter Experiment

Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, M. Balata, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui , et al. (170 additional authors not shown)

Abstract: The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in… ▽ More The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in cryostat). The experiment is expected to extend the sensitivity to WIMP dark matter by more than an order of magnitude compared to XENON1T, thanks to the larger active mass and the significantly reduced background, improved by novel systems such as a radon removal plant and a neutron veto. This article describes the XENONnT experiment and its sub-systems in detail and reports on the detector performance during the first science run. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 32 pages, 19 figures

arXiv:2401.15250 [pdf, other]

Experimental Demonstration of Attosecond Pump-Probe Spectroscopy with an X-ray Free-Electron Laser

Authors: Zhaoheng Guo, Taran Driver, Sandra Beauvarlet, David Cesar, Joseph Duris, Paris L. Franz, Oliver Alexander, Dorian Bohler, Christoph Bostedt, Vitali Averbukh, Xinxin Cheng, Louis F. DiMauro, Gilles Doumy, Ruaridh Forbes, Oliver Gessner, James M. Glownia, Erik Isele, Andrei Kamalov, Kirk A. Larsen, Siqi Li, Xiang Li, Ming-Fu Lin, Gregory A. McCracken, Razib Obaid, Jordan T. ONeal , et al. (25 additional authors not shown)

Abstract: Pump-probe experiments with sub-femtosecond resolution are the key to understanding electronic dynamics in quantum systems. Here we demonstrate the generation and control of sub-femtosecond pulse pairs from a two-colour X-ray free-electron laser (XFEL). By measuring the delay between the two pulses with an angular streaking diagnostic, we characterise the group velocity of the XFEL and demonstrate… ▽ More Pump-probe experiments with sub-femtosecond resolution are the key to understanding electronic dynamics in quantum systems. Here we demonstrate the generation and control of sub-femtosecond pulse pairs from a two-colour X-ray free-electron laser (XFEL). By measuring the delay between the two pulses with an angular streaking diagnostic, we characterise the group velocity of the XFEL and demonstrate control of the pulse delay down to 270 as. We demonstrate the application of this technique to a pump-probe measurement in core-excited para-aminophenol. These results demonstrate the ability to perform pump-probe experiments with sub-femtosecond resolution and atomic site specificity. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 55 pages, main manuscript (5 figures) + supplementary materials (25 figures), 30 figures total. Submitted to Nature Photonics

arXiv:2401.04321 [pdf, other]

doi 10.1103/PhysRevB.109.195406

Gate-tunable topological phases in superlattice modulated bilayer graphene

Authors: Yongxin Zeng, Tobias M. R. Wolf, Chunli Huang, Nemin Wei, Sayed Ali Akbar Ghorashi, Allan H. MacDonald, Jennifer Cano

Abstract: Superlattice potential modulation can produce flat minibands in Bernal-stacked bilayer graphene. In this work we study how band topology and interaction-induced symmetry-broken phases in this system are controlled by tuning the displacement field and the shape and strength of the superlattice potential. We use an analytic perturbative analysis to demonstrate that topological flat bands are favored… ▽ More Superlattice potential modulation can produce flat minibands in Bernal-stacked bilayer graphene. In this work we study how band topology and interaction-induced symmetry-broken phases in this system are controlled by tuning the displacement field and the shape and strength of the superlattice potential. We use an analytic perturbative analysis to demonstrate that topological flat bands are favored by a honeycomb-lattice-shaped potential, and numerics to show that the robustness of topological bands depends on both the displacement field strength and the periodicity of the superlattice potential. At integer fillings of the topological flat bands, the strength of the displacement field and the superlattice potential tune phase transitions between quantum anomalous Hall insulator, trivial insulator, and metallic states. We present mean-field phase diagrams in a gate voltage parameter space at filling factor $ν=1$, and discuss the prospects of realizing quantum anomalous Hall insulators and fractional Chern insulators when the superlattice potential modulation is produced by dielectric patterning or adjacent moiré materials. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: Phys. Rev. B 109, 195406 (2024)

arXiv:2312.09783 [pdf, other]

Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning

Authors: Tom Nuno Wolf, Fabian Bongratz, Anne-Marie Rickmann, Sebastian Pölsterl, Christian Wachinger

Abstract: Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference,… ▽ More Explaining predictions of black-box neural networks is crucial when applied to decision-critical tasks. Thus, attribution maps are commonly used to identify important image regions, despite prior work showing that humans prefer explanations based on similar examples. To this end, ProtoPNet learns a set of class-representative feature vectors (prototypes) for case-based reasoning. During inference, similarities of latent features to prototypes are linearly classified to form predictions and attribution maps are provided to explain the similarity. In this work, we evaluate whether architectures for case-based reasoning fulfill established axioms required for faithful explanations using the example of ProtoPNet. We show that such architectures allow the extraction of faithful explanations. However, we prove that the attribution maps used to explain the similarities violate the axioms. We propose a new procedure to extract explanations for trained ProtoPNets, named ProtoPFaith. Conceptually, these explanations are Shapley values, calculated on the similarity scores of each prototype. They allow to faithfully answer which prototypes are present in an unseen image and quantify each pixel's contribution to that presence, thereby complying with all axioms. The theoretical violations of ProtoPNet manifest in our experiments on three datasets (CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet, ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a qualitative difference between the explanations given by ProtoPNet and ProtoPFaith. Additionally, we quantify the explanations with the Area Over the Perturbation Curve, on which ProtoPFaith outperforms ProtoPNet on all experiments by a factor $>10^3$. △ Less

Submitted 19 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: To be published in proceedings of AAAI Conference on Artificial Intelligence

arXiv:2312.06527 [pdf, other]

Can Reinforcement Learning support policy makers? A preliminary study with Integrated Assessment Models

Authors: Theodore Wolf, Nantas Nardelli, John Shawe-Taylor, Maria Perez-Ortiz

Abstract: Governments around the world aspire to ground decision-making on evidence. Many of the foundations of policy making - e.g. sensing patterns that relate to societal needs, develo** evidence-based programs, forecasting potential outcomes of policy changes, and monitoring effectiveness of policy programs - have the potential to benefit from the use of large-scale datasets or simulations together wi… ▽ More Governments around the world aspire to ground decision-making on evidence. Many of the foundations of policy making - e.g. sensing patterns that relate to societal needs, develo** evidence-based programs, forecasting potential outcomes of policy changes, and monitoring effectiveness of policy programs - have the potential to benefit from the use of large-scale datasets or simulations together with intelligent algorithms. These could, if designed and deployed in a way that is well grounded on scientific evidence, enable a more comprehensive, faster, and rigorous approach to policy making. Integrated Assessment Models (IAM) is a broad umbrella covering scientific models that attempt to link main features of society and economy with the biosphere into one modelling framework. At present, these systems are probed by policy makers and advisory groups in a hypothesis-driven manner. In this paper, we empirically demonstrate that modern Reinforcement Learning can be used to probe IAMs and explore the space of solutions in a more principled manner. While the implication of our results are modest since the environment is simplistic, we believe that this is a step** stone towards more ambitious use cases, which could allow for effective exploration of policies and understanding of their consequences and limitations. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Published at NeurIPS'23 Workshop on Tackling Climate Change with Machine Learning

arXiv:2312.06511 [pdf, other]

Paradigm for finding d-electron heavy fermions: the case of Cr-doped CsFe$_2$As$_2$

Authors: Matteo Crispino, Pablo Villar Arribi, Anmol Shukla, Frédéric Hardy, Amir-Abbas Haghighirad, Thomas Wolf, Rolf Heid, Christoph Meingast, Tommaso Gorni, Adolfo Avella, Luca de' Medici

Abstract: We define a general strategy for finding new heavy-fermionic materials without rare-earth elements: do** a Hund metal with pronounced orbital-selective correlations towards half-filling. We argue that in general band structures a possible orbital-selective Mott transition is frustrated by inter-orbital hop** into heavy-fermion behaviour - where d-orbitals provide both the heavy and the light e… ▽ More We define a general strategy for finding new heavy-fermionic materials without rare-earth elements: do** a Hund metal with pronounced orbital-selective correlations towards half-filling. We argue that in general band structures a possible orbital-selective Mott transition is frustrated by inter-orbital hop** into heavy-fermion behaviour - where d-orbitals provide both the heavy and the light electrons - which is enhanced when approaching half-filling. This phase ultimately disappears due to magnetic correlations, as in a standard Doniach diagram. Experimentally we have further hole doped CsFe$_2$As$_2$, a Hund metal with 0.5 electrons/Fe away from half-filling, and obtained a heavy fermionic state with the highest Sommerfeld coefficient for Fe-pnictides to date (270 mJ/mol K$^2$), before signatures of an antiferromagnetic phase set in. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 16 pages, 8 figures

arXiv:2312.03671 [pdf, other]

Direct Exoplanet Detection Using Deep Convolutional Image Reconstruction (ConStruct): A New Algorithm for Post-Processing High-Contrast Images

Authors: Trevor N. Wolf, Brandon A. Jones, Brendan P. Bowler

Abstract: We present a novel machine-learning approach for detecting faint point sources in high-contrast adaptive optics imaging datasets. The most widely used algorithms for primary subtraction aim to decouple bright stellar speckle noise from planetary signatures by subtracting an approximation of the temporally evolving stellar noise from each frame in an imaging sequence. Our approach aims to improve t… ▽ More We present a novel machine-learning approach for detecting faint point sources in high-contrast adaptive optics imaging datasets. The most widely used algorithms for primary subtraction aim to decouple bright stellar speckle noise from planetary signatures by subtracting an approximation of the temporally evolving stellar noise from each frame in an imaging sequence. Our approach aims to improve the stellar noise approximation and increase the planet detection sensitivity by leveraging deep learning in a novel direct imaging post-processing algorithm. We show that a convolutional autoencoder neural network, trained on an extensive reference library of real imaging sequences, accurately reconstructs the stellar speckle noise at the location of a potential planet signal. This tool is used in a post-processing algorithm we call Direct Exoplanet Detection with Convolutional Image Reconstruction, or ConStruct. The reliability and sensitivity of ConStruct are assessed using real Keck/NIRC2 angular differential imaging datasets. Of the 30 unique point sources we examine, ConStruct yields a higher S/N than traditional PCA-based processing for 67$\%$ of the cases and improves the relative contrast by up to a factor of 2.6. This work demonstrates the value and potential of deep learning to take advantage of a diverse reference library of point spread function realizations to improve direct imaging post-processing. ConStruct and its future improvements may be particularly useful as tools for post-processing high-contrast images from the James Webb Space Telescope and extreme adaptive optics instruments, both for the current generation and those being designed for the upcoming 30 meter-class telescopes. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.12983 [pdf, other]

GAIA: a benchmark for General AI Assistants

Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

Abstract: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human r… ▽ More We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92\% vs. 15\% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board available at https://huggingface.co/gaia-benchmark. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12482 [pdf]

Monitoring the evolution of relative product populations at early times during a photochemical reaction

Authors: Joao Pedro Figueira Nunes, Lea Maria Ibele, Shashank Pathak, Andrew R. Attar, Surjendu Bhattacharyya, Rebecca Boll, Kurtis Borne, Martin Centurion, Benjamin Erk, Ming-Fu Lin, Ruaridh J. G. Forbes, Nate Goff, Christopher S. Hansen, Matthias Hoffmann, David M. P. Holland, Rebecca A. Ingle, Duan Luo, Sri Bhavya Muvva, Alex Reid, Arnaud Rouzée, Artem Rudenko, Sajib Kumar Saha, Xiaozhe Shen, Anbu Selvam Venkatachalam, Xijie Wang , et al. (9 additional authors not shown)

Abstract: Identifying multiple rival reaction products and transient species formed during ultrafast photochemical reactions and determining their time-evolving relative populations are key steps towards understanding and predicting photochemical outcomes. Yet, most contemporary ultrafast studies struggle with clearly identifying and quantifying competing molecular structures/species amongst the emerging re… ▽ More Identifying multiple rival reaction products and transient species formed during ultrafast photochemical reactions and determining their time-evolving relative populations are key steps towards understanding and predicting photochemical outcomes. Yet, most contemporary ultrafast studies struggle with clearly identifying and quantifying competing molecular structures/species amongst the emerging reaction products. Here, we show that mega-electronvolt ultrafast electron diffraction in combination with ab initio molecular dynamics calculations offer a powerful route to determining time-resolved populations of the various isomeric products formed after UV (266 nm) excitation of the five-membered heterocyclic molecule 2(5H)-thiophenone. This strategy provides experimental validation of the predicted high (~50%) yield of an episulfide isomer containing a strained 3-membered ring within ~1 ps of photoexcitation and highlights the rapidity of interconversion between the rival highly vibrationally excited photoproducts in their ground electronic state. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.11449 [pdf, other]

Quasi-boson approximation yields accurate correlation energy in the 2D electron gas

Authors: Tobias M. R. Wolf, Chunli Huang

Abstract: We report the successful adaptation of the quasi-boson approximation, a technique traditionally employed in nuclear physics, to the analysis of the two-dimensional electron gas. We show that the correlation energy estimated from this approximation agrees closely with the results obtained from quantum Monte Carlo simulations. Our methodology comprehensively incorporates the exchange self-energy, di… ▽ More We report the successful adaptation of the quasi-boson approximation, a technique traditionally employed in nuclear physics, to the analysis of the two-dimensional electron gas. We show that the correlation energy estimated from this approximation agrees closely with the results obtained from quantum Monte Carlo simulations. Our methodology comprehensively incorporates the exchange self-energy, direct scattering, and exchange scattering for a particle-hole pair excited out of the mean-field groundstate within the equation-of-motion framework. The linearization of the equation of motion leads to a generalized-random-phase-approximation (gRPA) eigenvalue equation whose spectrum indicates that the plasmon dispersion remains unaffected by exchange effects, while the particle-hole continuum experiences a marked upward shift due to the exchange self-energy. Notably, the plasmon mode retains its collective nature within the particle-hole continuum, up to moderately short wavelength ($q\sim 0.3 k_F$ at metallic density $r_s=4$). Using the gRPA excitation spectrum, we calculate the zero-point energy of the quasi-boson Hamiltonian, thereby approximating the correlation energy of the original Hamiltonian. This research highlights the potential and effectiveness of applying the quasi-boson approximation to the gRPA spectrum, a fundamental technique in nuclear physics, to extended condensed matter systems. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures

arXiv:2311.05640 [pdf, other]

FinGPT: Large Generative Models for a Small Language

Authors: Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Le Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, Sampo Pyysalo

Abstract: Large language models (LLMs) excel in many tasks in NLP and beyond, but most open models have very limited coverage of smaller languages and LLM work tends to focus on languages where nearly unlimited data is available for pretraining. In this work, we study the challenges of creating LLMs for Finnish, a language spoken by less than 0.1% of the world population. We compile an extensive dataset of… ▽ More Large language models (LLMs) excel in many tasks in NLP and beyond, but most open models have very limited coverage of smaller languages and LLM work tends to focus on languages where nearly unlimited data is available for pretraining. In this work, we study the challenges of creating LLMs for Finnish, a language spoken by less than 0.1% of the world population. We compile an extensive dataset of Finnish combining web crawls, news, social media and eBooks. We pursue two approaches to pretrain models: 1) we train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT, 2) we continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI. For model evaluation, we introduce FIN-bench, a version of BIG-bench with Finnish tasks. We also assess other model qualities such as toxicity and bias. Our models and tools are openly available at https://turkunlp.org/gpt3-finnish. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 17 pages (10 main), 7 figures, 5 tables

arXiv:2310.16944 [pdf, other]

Zephyr: Direct Distillation of LM Alignment

Authors: Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf

Abstract: We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Start… ▽ More We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model. Code, models, data, and tutorials for the system are available at https://github.com/huggingface/alignment-handbook. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.11996 [pdf, other]

doi 10.1140/epjc/s10052-023-12296-y

Design and performance of the field cage for the XENONnT experiment

Authors: E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso, D. Cichon , et al. (139 additional authors not shown)

Abstract: The precision in reconstructing events detected in a dual-phase time projection chamber depends on an homogeneous and well understood electric field within the liquid target. In the XENONnT TPC the field homogeneity is achieved through a double-array field cage, consisting of two nested arrays of field sha** rings connected by an easily accessible resistor chain. Rather than being connected to t… ▽ More The precision in reconstructing events detected in a dual-phase time projection chamber depends on an homogeneous and well understood electric field within the liquid target. In the XENONnT TPC the field homogeneity is achieved through a double-array field cage, consisting of two nested arrays of field sha** rings connected by an easily accessible resistor chain. Rather than being connected to the gate electrode, the topmost field sha** ring is independently biased, adding a degree of freedom to tune the electric field during operation. Two-dimensional finite element simulations were used to optimize the field cage, as well as its operation. Simulation results were compared to ${}^{83m}\mathrm{Kr}$ calibration data. This comparison indicates an accumulation of charge on the panels of the TPC which is constant over time, as no evolution of the reconstructed position distribution of events is observed. The simulated electric field was then used to correct the charge signal for the field dependence of the charge yield. This correction resolves the inconsistent measurement of the drift electron lifetime when using different calibrations sources and different field cage tuning voltages. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Journal ref: Eur. Phys. J. C 84, 138 (2024)

arXiv:2308.03996 [pdf, other]

Investigating dissociation pathways of nitrobenzene via mega-electron-volt ultrafast electron diffraction

Authors: Kareem Hegazy, James Cryan, Renkai Li, Ming-Fu Lin, Brian Moore, Pedro Nunes, Xiaozhe Shen, Stephen Weathersby, Jie Yang, Xijie Wang, Thomas Wolf

Abstract: As the simplest nitroaromatic compound, nitrobenzene is an interesting model system to explore the rich photochemistry of nitroaromatic compounds. Previous measurements of nitrobenzene's photochemical dynamics have probed structural and electronic properties, which, at times, paint a convoluted and sometimes contradictory description of the photochemical landscape. A sub-picosecond structural prob… ▽ More As the simplest nitroaromatic compound, nitrobenzene is an interesting model system to explore the rich photochemistry of nitroaromatic compounds. Previous measurements of nitrobenzene's photochemical dynamics have probed structural and electronic properties, which, at times, paint a convoluted and sometimes contradictory description of the photochemical landscape. A sub-picosecond structural probe can complement previous electronic measurements and aid in determining the photochemical dynamics with less ambiguity. We investigate the ultrafast dynamics of nitrobenzene triggered by photoexcitation at 267 nm employing megaelectronvolt ultrafast electron diffraction with femtosecond time resolution. We measure the first 5 ps of dynamics and, by comparing our measured results to simulation, we unambiguously distinguish the lowest singlet and triplet electronic states. We observe ground state recovery within 160 +/- 60 fs through internal conversions and without signal corresponding to photofragmentation. Our lack of dissociation signal within the first 5 ps indicates that previously observed photofragmenation reactions take place in the vibrationally "hot" ground state on timescales considerably beyond 5 ps. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures, and 1 table

arXiv:2306.16340 [pdf, other]

Cosmogenic background simulations for the DARWIN observatory at different underground locations

Authors: M. Adrover, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, B. Antunovic, E. Aprile, M. Babicz, D. Bajpai, E. Barberio, L. Baudis, M. Bazyk, N. Bell, L. Bellagamba, R. Biondi, Y. Biondi, A. Bismark, C. Boehm, A. Breskin, E. J. Brookes, A. Brown, G. Bruno, R. Budnik, C. Capelli, J. M. R. Cardoso , et al. (158 additional authors not shown)

Abstract: Xenon dual-phase time projections chambers (TPCs) have proven to be a successful technology in studying physical phenomena that require low-background conditions. With 40t of liquid xenon (LXe) in the TPC baseline design, DARWIN will have a high sensitivity for the detection of particle dark matter, neutrinoless double beta decay ($0νββ$), and axion-like particles (ALPs). Although cosmic muons are… ▽ More Xenon dual-phase time projections chambers (TPCs) have proven to be a successful technology in studying physical phenomena that require low-background conditions. With 40t of liquid xenon (LXe) in the TPC baseline design, DARWIN will have a high sensitivity for the detection of particle dark matter, neutrinoless double beta decay ($0νββ$), and axion-like particles (ALPs). Although cosmic muons are a source of background that cannot be entirely eliminated, they may be greatly diminished by placing the detector deep underground. In this study, we used Monte Carlo simulations to model the cosmogenic background expected for the DARWIN observatory at four underground laboratories: Laboratori Nazionali del Gran Sasso (LNGS), Sanford Underground Research Facility (SURF), Laboratoire Souterrain de Modane (LSM) and SNOLAB. We determine the production rates of unstable xenon isotopes and tritium due to muon-included neutron fluxes and muon-induced spallation. These are expected to represent the dominant contributions to cosmogenic backgrounds and thus the most relevant for site selection. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.11871 [pdf, other]

Search for events in XENON1T associated with Gravitational Waves

Authors: XENON Collaboration, E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antoń Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso , et al. (138 additional authors not shown)

Abstract: We perform a blind search for particle signals in the XENON1T dark matter detector that occur close in time to gravitational wave signals in the LIGO and Virgo observatories. No particle signal is observed in the nuclear recoil, electronic recoil, CE$ν$NS, and S2-only channels within $\pm$ 500 seconds of observations of the gravitational wave signals GW170104, GW170729, GW170817, GW170818, and GW1… ▽ More We perform a blind search for particle signals in the XENON1T dark matter detector that occur close in time to gravitational wave signals in the LIGO and Virgo observatories. No particle signal is observed in the nuclear recoil, electronic recoil, CE$ν$NS, and S2-only channels within $\pm$ 500 seconds of observations of the gravitational wave signals GW170104, GW170729, GW170817, GW170818, and GW170823. We use this null result to constrain mono-energetic neutrinos and Beyond Standard Model particles emitted in the closest coalescence GW170817, a binary neutron star merger. We set new upper limits on the fluence (time-integrated flux) of coincident neutrinos down to 17 keV at 90% confidence level. Furthermore, we constrain the product of coincident fluence and cross section of Beyond Standard Model particles to be less than $10^{-29}$ cm$^2$/cm$^2$ in the [5.5-210] keV energy range at 90% confidence level. △ Less

Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.03890 [pdf, other]

doi 10.1088/1361-648X/ad1ca8

Bosonic excitation spectra of superconducting $\mathrm{Bi_2Sr_2CaCu_2O_{8+δ}}$ and $\mathrm{YBa_2Cu_3O_{6+x}}$ extracted from scanning tunneling spectra

Authors: Thomas Gozlinski, Mirjam Henn, Thomas Wolf, Matthieu Le Tacon, Jörg Schmalian, Wulf Wulfhekel

Abstract: A detailed interpretation of scanning tunneling spectra obtained on unconventional superconductors enables one to gain information on the pairing boson. Decisive for this approach are inelastic tunneling events. Due to the lack of momentum conservation in tunneling from or to the sharp tip, those are enhanced in the geometry of a scanning tunneling microscope compared to planar tunnel junctions. T… ▽ More A detailed interpretation of scanning tunneling spectra obtained on unconventional superconductors enables one to gain information on the pairing boson. Decisive for this approach are inelastic tunneling events. Due to the lack of momentum conservation in tunneling from or to the sharp tip, those are enhanced in the geometry of a scanning tunneling microscope compared to planar tunnel junctions. This work extends the method of obtaining the bosonic excitation spectrum by deconvolution from tunneling spectra to nodal $d$-wave superconductors. In particular, scanning tunneling spectra of slightly underdoped $\mathrm{Bi_2Sr_2CaCu_2O_{8+δ}}$ with a $T_c$ of $82\,\mathrm{K}$ and optimally doped $\mathrm{YBa_2Cu_3O_{6+x}}$ with a $T_c$ of $92\,\mathrm{K}$ reveal a resonance mode in their bosonic excitation spectrum at $Ω_\mathrm{res} \approx 63\,\mathrm{meV}$ and $Ω_\mathrm{res} \approx 61\,\mathrm{meV}$ respectively. In both cases, the overall shape of the bosonic excitation spectrum is indicative of predominant spin scattering with a resonant mode at $Ω_\mathrm{res}<2Δ$ and overdamped spin fluctuations for energies larger than $2Δ$. To perform the deconvolution of the experimental data, we implemented an efficient iterative algorithm that significantly enhances the reliability of our analysis. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Journal ref: Journal of Physics: Condensed Matter 36, 175601 (2024)

arXiv:2305.16264 [pdf, other]

Scaling Data-Constrained Language Models

Authors: Niklas Muennighoff, Alexander M. Rush, Boaz Barak, Teven Le Scao, Aleksandra Piktus, Nouamane Tazi, Sampo Pyysalo, Thomas Wolf, Colin Raffel

Abstract: The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the… ▽ More The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at https://github.com/huggingface/datablations. △ Less

Submitted 25 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 50 pages (9 main), 39 figures, 15 tables

arXiv:2305.06161 [pdf, other]

StarCoder: may the source be with you!

Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license. △ Less

Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

arXiv:2305.05169 [pdf]

A compact single-shot soft X-ray photon spectrometer for free electron laser diagnostics

Authors: Kirk A. Larsen, Kurtis Borne, Razib Obaid, Andrei Kamalov, Yusong Liu, Xinxin Cheng, Justin James, Taran Driver, Kenan Li, Yanwei Liu, Anne Sakdinawat, Christian David, Thomas J. A. Wolf, James Cryan, Peter Walter, Ming-Fu Lin

Abstract: The photon spectrum from free-electron laser (FEL) light sources offers valuable information in time-resolved experiments and machine optimization in the spectral and temporal domains. We have developed a compact single-shot photon spectrometer to diagnose soft X-ray spectra. The spectrometer consists of an array of off-axis Fresnel zone plates (FZP) that act as transmission-imaging gratings, a Ce… ▽ More The photon spectrum from free-electron laser (FEL) light sources offers valuable information in time-resolved experiments and machine optimization in the spectral and temporal domains. We have developed a compact single-shot photon spectrometer to diagnose soft X-ray spectra. The spectrometer consists of an array of off-axis Fresnel zone plates (FZP) that act as transmission-imaging gratings, a Ce-YAG scintillator, and a microscope objective to image the scintillation target onto a two-dimensional imaging detector. This spectrometer operates in an energy range which covers absorption edges associated with several atomic constituents carbon, nitrogen, oxygen, and neon. The spectrometer's performance is demonstrated at a repetition rate of 120 Hz, but our detection scheme can be easily extended to 200 kHz spectral collection by employing a fast complementary metal oxide semiconductor (CMOS) line-scan camera to detect the light from the scintillator. This compact photon spectrometer provides an opportunity for monitoring the spectrum downstream of an endstation in a limited space environment with subelectronvolt energy resolution. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 14 pages, 5 figures, 1 table

arXiv:2304.10931 [pdf, other]

doi 10.1103/PhysRevLett.130.261002

Searching for Heavy Dark Matter near the Planck Mass with XENON1T

Authors: E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso, D. Cichon , et al. (142 additional authors not shown)

Abstract: Multiple viable theoretical models predict heavy dark matter particles with a mass close to the Planck mass, a range relatively unexplored by current experimental measurements. We use 219.4 days of data collected with the XENON1T experiment to conduct a blind search for signals from Multiply-Interacting Massive Particles (MIMPs). Their unique track signature allows a targeted analysis with only 0.… ▽ More Multiple viable theoretical models predict heavy dark matter particles with a mass close to the Planck mass, a range relatively unexplored by current experimental measurements. We use 219.4 days of data collected with the XENON1T experiment to conduct a blind search for signals from Multiply-Interacting Massive Particles (MIMPs). Their unique track signature allows a targeted analysis with only 0.05 expected background events from muons. Following unblinding, we observe no signal candidate events. This work places strong constraints on spin-independent interactions of dark matter particles with a mass between 1$\times$10$^{12}\,$GeV/c$^2$ and 2$\times$10$^{17}\,$GeV/c$^2$. In addition, we present the first exclusion limits on spin-dependent MIMP-neutron and MIMP-proton cross-sections for dark matter particles with masses close to the Planck scale. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 7 pages, 6 figures

Journal ref: Phys. Rev. Lett. 130, 261002 (2023)

arXiv:2304.08332 [pdf, other]

Multiscale hierarchical decomposition methods for ill-posed problems

Authors: Stefan Kindermann, Elena Resmerita, Tobias Wolf

Abstract: The Multiscale Hierarchical Decomposition Method (MHDM) was introduced as an iterative method for total variation regularization, with the aim of recovering details at various scales from images corrupted by additive or multiplicative noise. Given its success beyond image restoration, we extend the MHDM iterates in order to solve larger classes of linear ill-posed problems in Banach spaces. Thus,… ▽ More The Multiscale Hierarchical Decomposition Method (MHDM) was introduced as an iterative method for total variation regularization, with the aim of recovering details at various scales from images corrupted by additive or multiplicative noise. Given its success beyond image restoration, we extend the MHDM iterates in order to solve larger classes of linear ill-posed problems in Banach spaces. Thus, we define the MHDM for more general convex or even non-convex penalties, and provide convergence results for the data fidelity term. We also propose a flexible version of the method using adaptive convex functionals for regularization, and show an interesting multiscale decomposition of the data. This decomposition result is highlighted for the Bregman iteration method that can be expressed as an adaptive MHDM. Furthermore, we state necessary and sufficient conditions when the MHDM iteration agrees with the variational Tikhonov regularization, which is the case, for instance, for one-dimensional total variation denoising. Finally, we investigate several particular instances and perform numerical experiments that point out the robust behavior of the MHDM. △ Less

Submitted 27 September, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.05428 [pdf, other]

doi 10.1103/PhysRevD.108.012016

Detector signal characterization with a Bayesian network in XENONnT

Authors: XENON Collaboration, E. Aprile, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso , et al. (142 additional authors not shown)

Abstract: We developed a detector signal characterization model based on a Bayesian network trained on the waveform attributes generated by a dual-phase xenon time projection chamber. By performing inference on the model, we produced a quantitative metric of signal characterization and demonstrate that this metric can be used to determine whether a detector signal is sourced from a scintillation or an ioniz… ▽ More We developed a detector signal characterization model based on a Bayesian network trained on the waveform attributes generated by a dual-phase xenon time projection chamber. By performing inference on the model, we produced a quantitative metric of signal characterization and demonstrate that this metric can be used to determine whether a detector signal is sourced from a scintillation or an ionization process. We describe the method and its performance on electronic-recoil (ER) data taken during the first science run of the XENONnT dark matter experiment. We demonstrate the first use of a Bayesian network in a waveform-based analysis of detector signals. This method resulted in a 3% increase in ER event-selection efficiency with a simultaneously effective rejection of events outside of the region of interest. The findings of this analysis are consistent with the previous analysis from XENONnT, namely a background-only fit of the ER data. △ Less

Submitted 26 July, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 11 pages, 8 figures

Journal ref: Phys. Rev. D 108, 012016 (2023)

arXiv:2303.17350 [pdf, other]

Partial condensation of mobile excitons in graphene multilayers

Authors: Igor V. Blinov, Chunli Huang, Nemin Wei, Qin Wei, Tobias Wolf, Allan H. MacDonald

Abstract: At a large displacement field, in rhomboedral and Bernal-stacked graphene a normal paramagnetic state transitions to a correlated state. Recent experiments showed that such systems have several phase transitions as a function of the carrier density. The phase adjacent to a paramagnetic state has anomalously high resistance and reduced degeneracy of the Fermi sea. We show that both phenomena can be… ▽ More At a large displacement field, in rhomboedral and Bernal-stacked graphene a normal paramagnetic state transitions to a correlated state. Recent experiments showed that such systems have several phase transitions as a function of the carrier density. The phase adjacent to a paramagnetic state has anomalously high resistance and reduced degeneracy of the Fermi sea. We show that both phenomena can be explained through a concept of partial intervalley exciton condensation: a fraction of particles condenses into excitons, and another forms an intervalley coherent Fermi liquid. The exciton part of the system do not contribute to the electrical current thus increasing the resistance. Within this paradigm, the increase in the resistance has entirely geometrical origin. We check validity of the phenomenological theory through numerical calculations. We also show that the quantum oscillation data should not be very different between the partial excitonic state and the intervalley coherent states suggested by other authors. Further, we suggest STM/AFM or Raman spectroscopy to have a conclusive evidence for the occurrence of the partial exciton condensation that we suggest in this paper. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.14729 [pdf, other]

doi 10.1103/PhysRevLett.131.041003

First Dark Matter Search with Nuclear Recoils from the XENONnT Experiment

Authors: XENON Collaboration, E. Aprile, K. Abe, F. Agostini, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai , et al. (141 additional authors not shown)

Abstract: We report on the first search for nuclear recoils from dark matter in the form of weakly interacting massive particles (WIMPs) with the XENONnT experiment which is based on a two-phase time projection chamber with a sensitive liquid xenon mass of $5.9$ t. During the approximately 1.1 tonne-year exposure used for this search, the intrinsic $^{85}$Kr and $^{222}$Rn concentrations in the liquid targe… ▽ More We report on the first search for nuclear recoils from dark matter in the form of weakly interacting massive particles (WIMPs) with the XENONnT experiment which is based on a two-phase time projection chamber with a sensitive liquid xenon mass of $5.9$ t. During the approximately 1.1 tonne-year exposure used for this search, the intrinsic $^{85}$Kr and $^{222}$Rn concentrations in the liquid target were reduced to unprecedentedly low levels, giving an electronic recoil background rate of $(15.8\pm1.3)~\mathrm{events}/(\mathrm{t\cdot y \cdot keV})$ in the region of interest. A blind analysis of nuclear recoil events with energies between $3.3$ keV and $60.5$ keV finds no significant excess. This leads to a minimum upper limit on the spin-independent WIMP-nucleon cross section of $2.58\times 10^{-47}~\mathrm{cm}^2$ for a WIMP mass of $28~\mathrm{GeV}/c^2$ at $90\%$ confidence level. Limits for spin-dependent interactions are also provided. Both the limit and the sensitivity for the full range of WIMP masses analyzed here improve on previous results obtained with the XENON1T experiment for the same exposure. △ Less

Submitted 5 August, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: Limit points are included in the submission file

Journal ref: Phys. Rev. Lett. 131, 041003 (2023)

arXiv:2303.07717 [pdf, other]

HALOS: Hallucination-free Organ Segmentation after Organ Resection Surgery

Authors: Anne-Marie Rickmann, Murong Xu, Tom Nuno Wolf, Oksana Kovalenko, Christian Wachinger

Abstract: The wide range of research in deep learning-based medical image segmentation pushed the boundaries in a multitude of applications. A clinically relevant problem that received less attention is the handling of scans with irregular anatomy, e.g., after organ resection. State-of-the-art segmentation models often lead to organ hallucinations, i.e., false-positive predictions of organs, which cannot be… ▽ More The wide range of research in deep learning-based medical image segmentation pushed the boundaries in a multitude of applications. A clinically relevant problem that received less attention is the handling of scans with irregular anatomy, e.g., after organ resection. State-of-the-art segmentation models often lead to organ hallucinations, i.e., false-positive predictions of organs, which cannot be alleviated by oversampling or post-processing. Motivated by the increasing need to develop robust deep learning models, we propose HALOS for abdominal organ segmentation in MR images that handles cases after organ resection surgery. To this end, we combine missing organ classification and multi-organ segmentation tasks into a multi-task model, yielding a classification-assisted segmentation pipeline. The segmentation network learns to incorporate knowledge about organ existence via feature fusion modules. Extensive experiments on a small labeled test set and large-scale UK Biobank data demonstrate the effectiveness of our approach in terms of higher segmentation Dice scores and near-to-zero false positive prediction rate. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: To be published in proceedings of Information Processing In Medical Imaging (IPMI) 2023

arXiv:2303.07125 [pdf, other]

Don't PANIC: Prototypical Additive Neural Network for Interpretable Classification of Alzheimer's Disease

Authors: Tom Nuno Wolf, Sebastian Pölsterl, Christian Wachinger

Abstract: Alzheimer's disease (AD) has a complex and multifactorial etiology, which requires integrating information about neuroanatomy, genetics, and cerebrospinal fluid biomarkers for accurate diagnosis. Hence, recent deep learning approaches combined image and tabular information to improve diagnostic performance. However, the black-box nature of such neural networks is still a barrier for clinical appli… ▽ More Alzheimer's disease (AD) has a complex and multifactorial etiology, which requires integrating information about neuroanatomy, genetics, and cerebrospinal fluid biomarkers for accurate diagnosis. Hence, recent deep learning approaches combined image and tabular information to improve diagnostic performance. However, the black-box nature of such neural networks is still a barrier for clinical applications, in which understanding the decision of a heterogeneous model is integral. We propose PANIC, a prototypical additive neural network for interpretable AD classification that integrates 3D image and tabular data. It is interpretable by design and, thus, avoids the need for post-hoc explanations that try to approximate the decision of a network. Our results demonstrate that PANIC achieves state-of-the-art performance in AD classification, while directly providing local and global explanations. Finally, we show that PANIC extracts biologically meaningful signatures of AD, and satisfies a set of desirable desiderata for trustworthy machine learning. Our implementation is available at https://github.com/ai-med/PANIC . △ Less

Submitted 14 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: To be published in proceedings of Information Processing In Medical Imaging 2023

arXiv:2303.03586 [pdf, other]

Femtosecond electronic and hydrogen structural dynamics in ammonia imaged with ultrafast electron diffraction

Authors: Elio G. Champenois, Nanna H. List, Matthew Ware, Mathew Britton, Philip H. Bucksbaum, Xinxin Cheng, Martin Centurion, James P. Cryan, Ruaridh Forbes, Ian Gabalski, Kareem Hegazy, Matthias C. Hoffmann, Andrew J. Howard, Fuhao Ji, Ming-Fu Lin, J. Pedro Nunes, Xiaozhe Shen, Jie Yang, Xijie Wang, Todd J. Martinez, Thomas J. A. Wolf

Abstract: Directly imaging structural dynamics involving hydrogen atoms by ultrafast diffraction methods is complicated by their low scattering cross-sections. Here we demonstrate that megaelectronvolt ultrafast electron diffraction is sufficiently sensitive to follow hydrogen dynamics in isolated molecules. In a study of the photodissociation of gas phase ammonia, we simultaneously observe signatures of th… ▽ More Directly imaging structural dynamics involving hydrogen atoms by ultrafast diffraction methods is complicated by their low scattering cross-sections. Here we demonstrate that megaelectronvolt ultrafast electron diffraction is sufficiently sensitive to follow hydrogen dynamics in isolated molecules. In a study of the photodissociation of gas phase ammonia, we simultaneously observe signatures of the nuclear and corresponding electronic structure changes resulting from the dissociation dynamics in the time-dependent diffraction. Both assignments are confirmed by ab initio simulations of the photochemical dynamics and the resulting diffraction observable. While the temporal resolution of the experiment is insufficient to resolve the dissociation in time, our results represent an important step towards the observation of proton dynamics in real space and time. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.12518 [pdf, other]

doi 10.1029/2022JD037544

3D climate simulations of the Archean find that methane has a strong cooling effect at high concentrations

Authors: Jake K. Eager-Nash, Nathan J. Mayne, Arwen E. Nicholson, Janke E. Prins, Oakley C. F. Young, Stuart J. Daines, Denis E. Sergeev, F. Hugo Lambert, James Manners, Ian A. Boutle, Eric T. Wolf, Inga E. E. Kamp, Krisztian Kohary, Tim M. Lenton

Abstract: Methane is thought to have been an important greenhouse gas during the Archean, although its potential warming has been found to be limited at high concentrations due to its high shortwave absorption. We use the Met Office Unified Model, a general circulation model, to further explore the climatic effect of different Archean methane concentrations. Surface warming peaks at a pressure ratio CH$_4$:… ▽ More Methane is thought to have been an important greenhouse gas during the Archean, although its potential warming has been found to be limited at high concentrations due to its high shortwave absorption. We use the Met Office Unified Model, a general circulation model, to further explore the climatic effect of different Archean methane concentrations. Surface warming peaks at a pressure ratio CH$_4$:CO$_2$ of approximately 0.1, reaching a maximum of up to 7 K before significant cooling above this ratio. Equator-to-pole temperature differences also tend to increase up to pCH$_4$ $\leq$300 Pa, which is driven by a difference in radiative forcing at the equator and poles by methane and a reduction in the latitudinal extend of the Hadley circulation. 3D models are important to fully capture the cooling effect of methane, due to these impacts of the circulation. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 36 pages, 18 figures

arXiv:2302.05420 [pdf, other]

doi 10.3847/2041-8213/acd6f6

Astrometric Accelerations as Dynamical Beacons: A Giant Planet Imaged Inside the Debris Disk of the Young Star AF Lep

Authors: Kyle Franson, Brendan P. Bowler, Yifan Zhou, Tim D. Pearce, Daniella C. Bardalez Gagliuffi, Lauren Biddle, Timothy D. Brandt, Justin R. Crepp, Trent J. Dupuy, Jacqueline Faherty, Rebecca Jensen-Clem, Marvin Morgan, Aniket Sanghi, Christopher A. Theissen, Quang H. Tran, Trevor A. Wolf

Abstract: We present the direct imaging discovery of a giant planet orbiting the young star AF Lep, a 1.2 $M_{\odot}$ member of the 24 $\pm$ 3 Myr $β$ Pic moving group. AF Lep was observed as part of our ongoing high-contrast imaging program targeting stars with astrometric accelerations between Hipparcos and Gaia that indicate the presence of substellar companions. Keck/NIRC2 observations in $L'$ with the… ▽ More We present the direct imaging discovery of a giant planet orbiting the young star AF Lep, a 1.2 $M_{\odot}$ member of the 24 $\pm$ 3 Myr $β$ Pic moving group. AF Lep was observed as part of our ongoing high-contrast imaging program targeting stars with astrometric accelerations between Hipparcos and Gaia that indicate the presence of substellar companions. Keck/NIRC2 observations in $L'$ with the Vector Vortex Coronagraph reveal a point source, AF Lep b, at ${\approx}340$ mas which exhibits orbital motion at the 6-$σ$ level over the course of 13 months. A joint orbit fit yields precise constraints on the planet's dynamical mass of 3.2$^{+0.7}_{-0.6}$ $M_\mathrm{Jup}$, semi-major axis of $8.4^{+1.1}_{-1.3}$ au, and eccentricity of $0.24^{+0.27}_{-0.15}$. AF Lep hosts a debris disk located at $\sim$50 au, but it is unlikely to be sculpted by AF Lep b, implying there may be additional planets in the system at wider separations. The stellar inclination ($i_* = 54^{+11}_{-9} {}^\circ$) and orbital inclination ($i_o = 50^{+9}_{-12} {}^\circ$) are in good agreement, which is consistent with the system having spin-orbit alignment. AF Lep b is the lowest-mass imaged planet with a dynamical mass measurement and highlights the promise of using astrometric accelerations as a tool to find and characterize long-period planets. △ Less

Submitted 25 May, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: 14 pages, 3 figures, accepted to ApJL

arXiv:2302.02662 [pdf, other]

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Authors: Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer

Abstract: Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding:… ▽ More Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5. △ Less

Submitted 6 September, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Journal ref: PMLR 202 (2023):3676-3713

arXiv:2301.13112 [pdf, other]

Benchmarking optimality of time series classification methods in distinguishing diffusions

Authors: Zehong Zhang, Fei Lu, Esther Xu Fei, Terry Lyons, Yannis Kevrekidis, Tom Woolf

Abstract: Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need… ▽ More Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need training, and the diffusion processes can be efficiently simulated and are flexible to reflect the specific features of real-world applications. We demonstrate the benchmarking with three widely-used TSC algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying high-dimensional nonlinear multivariate time series. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. △ Less

Submitted 11 April, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 23 pages, 8 figures

MSC Class: 62M02; 62M10; 62M20

arXiv:2212.11032 [pdf, other]

doi 10.1088/1748-0221/18/07/P07054

The Triggerless Data Acquisition System of the XENONnT Experiment

Authors: E. Aprile, J. Aalbers, K. Abe, F. Agostini, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui, C. Cai, J. M. R. Cardoso , et al. (140 additional authors not shown)

Abstract: The XENONnT detector uses the latest and largest liquid xenon-based time projection chamber (TPC) operated by the XENON Collaboration, aimed at detecting Weakly Interacting Massive Particles and conducting other rare event searches. The XENONnT data acquisition (DAQ) system constitutes an upgraded and expanded version of the XENON1T DAQ system. For its operation, it relies predominantly on commerc… ▽ More The XENONnT detector uses the latest and largest liquid xenon-based time projection chamber (TPC) operated by the XENON Collaboration, aimed at detecting Weakly Interacting Massive Particles and conducting other rare event searches. The XENONnT data acquisition (DAQ) system constitutes an upgraded and expanded version of the XENON1T DAQ system. For its operation, it relies predominantly on commercially available hardware accompanied by open-source and custom-developed software. The three constituent subsystems of the XENONnT detector, the TPC (main detector), muon veto, and the newly introduced neutron veto, are integrated into a single DAQ, and can be operated both independently and as a unified system. In total, the DAQ digitizes the signals of 698 photomultiplier tubes (PMTs), of which 253 from the top PMT array of the TPC are digitized twice, at $\times10$ and $\times0.5$ gain. The DAQ for the most part is a triggerless system, reading out and storing every signal that exceeds the digitization thresholds. Custom-developed software is used to process the acquired data, making it available within $\mathcal{O}\left(10\text{ s}\right)$ for live data quality monitoring and online analyses. The entire system with all the three subsystems was successfully commissioned and has been operating continuously, comfortably withstanding readout rates that exceed $\sim500$ MB/s during calibration. Livetime during normal operation exceeds $99\%$ and is $\sim90\%$ during most high-rate calibrations. The combined DAQ system has collected more than 2 PB of both calibration and science data during the commissioning of XENONnT and the first science run. △ Less

Submitted 21 December, 2022; originally announced December 2022.

arXiv:2212.07644 [pdf, other]

doi 10.1088/1748-0221/18/05/C05008

Ultraviolet Raman Spectroscopy for Remote Detection of Chlorine Gas

Authors: Arne Walter, Frank Wilsenack, Thomas Wolf, Frank Duschek

Abstract: As a primary material frequently used in industry, chlorine is relatively easy to obtain and available even in large quantities. Despite its high toxicity, molecular chlorine is readily available since it is an essential educt in the chemical industry. Over the past decades, numerous accidents involving injured and dead victims have occurred. Furthermore, it was already misused as a warfare agent… ▽ More As a primary material frequently used in industry, chlorine is relatively easy to obtain and available even in large quantities. Despite its high toxicity, molecular chlorine is readily available since it is an essential educt in the chemical industry. Over the past decades, numerous accidents involving injured and dead victims have occurred. Furthermore, it was already misused as a warfare agent at the beginning of the last century with still reported attacks. Early detection, localization, and monitoring of sources and cloud movements are essential for protecting stationary facilities, mobile operations, and the public. In contrast to most chemical hazardous materials, where it is possible to detect them by vibrational spectroscopic methods (e.\,g., passive hyper-spectral absorption technologies in the infrared), halogens are inactive to infrared absorption. Raman-based technologies rely on changes in the polarizability of the molecule and provide vibrational-spectroscopic access to such diatomic molecules and therefore close the gap in infrared detection capabilities. Here we present a straightforward approach for a standoff Raman detector in a backscattering configuration. This paper uses a simplified model to discuss optimum excitation wavelengths in achievable detection ranges. We validate the model by spontaneous (vibrational) Raman spectroscopic measurements between 20 and 60~m standoff distance. We also briefly discuss detection performances and technical and physical aspects as prospects of system design. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 6th International Conference on Frontiers of Diagnostic Technologies, 6 pages

arXiv:2212.04960 [pdf, other]

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Authors: Christopher Akiki, Giada Pistilli, Margot Mieskes, Matthias Gallé, Thomas Wolf, Suzana Ilić, Yacine Jernite

Abstract: The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a 1.6TB multilingual dataset that was used to train BLOOM, one of the largest multilingual language models to date. In addition to the technical outcomes and artifacts, the workshop fostered multidisciplinary collaborations around large models… ▽ More The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a 1.6TB multilingual dataset that was used to train BLOOM, one of the largest multilingual language models to date. In addition to the technical outcomes and artifacts, the workshop fostered multidisciplinary collaborations around large models, datasets, and their analysis. This in turn led to a wide range of research publications spanning topics from ethics to law, data governance, modeling choices and distributed training. This paper focuses on the collaborative research aspects of BigScience and takes a step back to look at the challenges of large-scale participatory research, with respect to participant diversity and the tasks required to successfully carry out such a project. Our main goal is to share the lessons we learned from this experience, what we could have done better and what we did well. We show how the impact of such a social approach to scientific research goes well beyond the technical artifacts that were the basis of its inception. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: Presented at the 2022 NeurIPS Workshop on Broadening Research Collaborations in ML

Showing 1–50 of 432 results for author: Woolf, T