Search | arXiv e-print repository

Survival of the Fittest Representation: A Case Study with Modular Addition

Authors: Xiaoman Delores Ding, Zifan Carl Guo, Eric J. Michaud, Ziming Liu, Max Tegmark

Abstract: When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representati… ▽ More When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14860 [pdf, other]

Not All Language Model Features Are Linear

Authors: Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark

Abstract: Recent work has proposed the linear representation hypothesis: that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimensional. We begin by develo** a rigorous definition of irreducible multi-dimensional features based on w… ▽ More Recent work has proposed the linear representation hypothesis: that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimensional. We begin by develo** a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into either independent or non-co-occurring lower-dimensional features. Motivated by these definitions, we design a scalable method that uses sparse autoencoders to automatically find multi-dimensional features in GPT-2 and Mistral 7B. These auto-discovered features include strikingly interpretable examples, e.g. circular features representing days of the week and months of the year. We identify tasks where these exact circles are used to solve computational problems involving modular arithmetic in days of the week and months of the year. Finally, we provide evidence that these circular features are indeed the fundamental unit of computation in these tasks with intervention experiments on Mistral 7B and Llama 3 8B, and we find further circular representations by breaking down the hidden states for these tasks into interpretable components. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Code and data at https://github.com/JoshEngels/MultiDimensionalFeatures

arXiv:2403.19647 [pdf, other]

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Authors: Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Abstract: We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse featur… ▽ More We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors. △ Less

Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz

arXiv:2402.11681 [pdf, other]

Opening the black box of language acquisition

Authors: Jérôme Michaud, Anna Jon-and

Abstract: Recent advances in large language models using deep learning techniques have renewed interest on how languages can be learned from data. However, it is unclear whether or how these models represent grammatical information from the learned languages. In addition, the models must be pre-trained on large corpora before they can be used. In this work, we propose an alternative, more transparent and co… ▽ More Recent advances in large language models using deep learning techniques have renewed interest on how languages can be learned from data. However, it is unclear whether or how these models represent grammatical information from the learned languages. In addition, the models must be pre-trained on large corpora before they can be used. In this work, we propose an alternative, more transparent and cognitively plausible architecture for learning language. Instead of using deep learning, our approach uses a minimal cognitive architecture based on sequence memory and chunking. The learning mechanism is based on the principles of reinforcement learning. We test our architecture on a number of natural-like toy languages. Results show that the model can learn these artificial languages from scratch and extract grammatical information that supports learning. Our study demonstrates the power of this simple architecture and stresses the importance of sequence memory as a key component of the language learning process. Since other animals do not seem to have a faithful sequence memory, this may explain why only humans have developed complex languages. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.05110 [pdf, other]

Opening the AI black box: program synthesis via mechanistic interpretability

Authors: Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

Abstract: We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G… ▽ More We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 24 pages

arXiv:2312.13742 [pdf, ps, other]

First measurement of the neutron-emission probability with a surrogate reaction in inverse kinematics at a heavy-ion storage ring

Authors: M. Sguazzin, B. Jurado, J. Pibernat, J. A. Swartz, M. Grieser, J. Glorius, Yu. A. Litvinov, J. Adamczewski-Musch, P. Alfaurt, P. Ascher, L. Audouin, C. Berthelot, B. Blank, K. Blaum, B. Brückner, S. Dellmann, I. Dillmann, C. Domingo-Pardo, M. Dupuis, P. Erbacher, M. Flayol, O. Forstner, D. Freire-Fernández, M. Gerbaux, J. Giovinazzo , et al. (27 additional authors not shown)

Abstract: Neutron-induced reaction cross sections of short-lived nuclei are imperative to understand the origin of heavy elements in stellar nucleosynthesis and for societal applications, but their measurement is extremely complicated due to the radioactivity of the targets involved. One way of overcoming this issue is to combine surrogate reactions with the unique possibilities offered by heavy-ion storage… ▽ More Neutron-induced reaction cross sections of short-lived nuclei are imperative to understand the origin of heavy elements in stellar nucleosynthesis and for societal applications, but their measurement is extremely complicated due to the radioactivity of the targets involved. One way of overcoming this issue is to combine surrogate reactions with the unique possibilities offered by heavy-ion storage rings. In this work, we describe the first surrogate-reaction experiment in inverse kinematics, which we successfully conducted at the Experimental Storage Ring (ESR) of the GSI/FAIR facility, using the $^{208}$Pb(p,p') reaction as a surrogate for neutron capture on $^{207}$Pb. Thanks to the outstanding detection efficiencies possible at the ESR, we were able to measure for the first time the neutron-emission probability as a function of the excitation energy of $^{208}$Pb. We demonstrate the strong connection between this probability and the neutron-induced radiative capture cross section of $^{207}$Pb, and provide reliable results for this cross section at neutron energies for which no experimental data exist. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 8 pages and 5 figures

arXiv:2307.15217 [pdf, other]

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems. △ Less

Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2303.13506 [pdf, other]

The Quantization Model of Neural Scaling

Authors: Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark

Abstract: We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned i… ▽ More We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory. △ Less

Submitted 13 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 24 pages, 18 figures, NeurIPS 2023

arXiv:2303.06969 [pdf]

Challenges in low losses and large acceptance ion beam transport

Authors: F Osswald, E Traykov, T Durand, M Heine, J Michaud, J C Thomas

Abstract: A prototype of ion beam transport module has been developed at the Institut Pluridisciplinaire Hubert Curien (IPHC) and used as a test bed to investigate key issues related to the efficient transport of ion beams. This includes the reduction of the beam losses, the increase of the acceptance, and the definition of the instrumentation necessary to evaluate the performances. An experiment was perfor… ▽ More A prototype of ion beam transport module has been developed at the Institut Pluridisciplinaire Hubert Curien (IPHC) and used as a test bed to investigate key issues related to the efficient transport of ion beams. This includes the reduction of the beam losses, the increase of the acceptance, and the definition of the instrumentation necessary to evaluate the performances. An experiment was performed on a full-scale beam line and following a standard beam analysis, steering, and focusing procedure. After a review of the developments carried out for some demanding facilities and for the design of the quadrupoles implemented in the transport module, the paper highlights the challenge of measuring the preservation of transverse phase-space distributions with large acceptance conditions, i.e. with the highest ratio of beam filling to quadrupole aperture. Then, the tolerance to the errors and mitigation of the risks are discussed, in particular by considering the electric stability of the transport module, beam trips, behavior of the tail and the halo, and misalignment errors. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2211.09611

arXiv:2211.09611 [pdf]

Green beam lines, a challenging concept

Authors: F. Osswald, E. Traykov, T. Durand, M. Heine, J. Michaud, J. C. Thomas

Abstract: Due to increasing environmental and economic constraints, optimization of ion beam transport and equipment design becomes essential. The future should be equipped with planet-friendly facilities, that is, solutions that reduce environmental impact and improve economic competitiveness. The tendency to increase the intensity of the current and the power of the beams obliges us and brings us to new c… ▽ More Due to increasing environmental and economic constraints, optimization of ion beam transport and equipment design becomes essential. The future should be equipped with planet-friendly facilities, that is, solutions that reduce environmental impact and improve economic competitiveness. The tendency to increase the intensity of the current and the power of the beams obliges us and brings us to new challenges. Installations tend to have larger dimensions with increased areas, volumes, weights and costs. A new ion beam transport prototype was developed and used as a test bed to identify key issues to reduce beam losses and preserve transverse phase-space distributions with large acceptance conditions. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.04863 [pdf]

doi 10.1088/1748-0221/18/01/P01011

Transverse emittance measurement in 2D and 4D performed on a Low Energy Beam Transport line: benchmarking and data analysis

Authors: F Osswald, T Durand, M Heine, J Michaud, F Poirier, J C Thomas, E Traykov

Abstract: 2D and 4D transverse phase-space of a low-energy ion-beam is measured with two of the most common emittance scanners. The article covers the description of the installation, the setup, the settings, the experiment and the benchmark of the two emittance meters. We compare the results from three series of measurements and present the advantages and drawbacks of the two systems. Coupling between phas… ▽ More 2D and 4D transverse phase-space of a low-energy ion-beam is measured with two of the most common emittance scanners. The article covers the description of the installation, the setup, the settings, the experiment and the benchmark of the two emittance meters. We compare the results from three series of measurements and present the advantages and drawbacks of the two systems. Coupling between phase-space planes, correlations and mitigation of deleterious effects are discussed. The influence of background noise and aberrations of trace-space figures on emittance measurements and RMS calculations is highlighted, especially for low density beams and halos. A new data analysis method using noise reduction, filtering, and reconstruction of the emittance figure is described. Finally, some basic concepts of phase-space theory and application to beam transport are recalled. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.13447 [pdf, other]

doi 10.3390/e25010175

Precision Machine Learning

Authors: Eric J. Michaud, Ziming Liu, Max Tegmark

Abstract: We explore unique considerations involved in fitting ML models to data with very high precision, as is often required for science applications. We empirically compare various function approximation methods and study how they scale with increasing parameters and data. We find that neural networks can often outperform classical approximation methods on high-dimensional examples, by auto-discovering… ▽ More We explore unique considerations involved in fitting ML models to data with very high precision, as is often required for science applications. We empirically compare various function approximation methods and study how they scale with increasing parameters and data. We find that neural networks can often outperform classical approximation methods on high-dimensional examples, by auto-discovering and exploiting modular structures therein. However, neural networks trained with common optimizers are less powerful for low-dimensional cases, which motivates us to study the unique properties of neural network loss landscapes and the corresponding optimization challenges that arise in the high precision regime. To address the optimization issue in low dimensions, we develop training tricks which enable us to train neural networks to extremely low loss, close to the limits allowed by numerical precision. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.01117 [pdf, other]

Omnigrok: Grokking Beyond Algorithmic Data

Authors: Ziming Liu, Eric J. Michaud, Max Tegmark

Abstract: Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against… ▽ More Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. We refer to this as the "LU mechanism" because training and test losses (against model weight norm) typically resemble "L" and "U", respectively. This simple mechanism can nicely explain many aspects of grokking: data size dependence, weight decay dependence, the emergence of representations, etc. Guided by the intuitive picture, we are able to induce grokking on tasks involving images, language and molecules. In the reverse direction, we are able to eliminate grokking for algorithmic datasets. We attribute the dramatic nature of grokking for algorithmic datasets to representation learning. △ Less

Submitted 23 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2205.10343 [pdf, other]

Towards Understanding Grokking: An Effective Theory of Representation Learning

Authors: Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams

Abstract: We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on trai… ▽ More We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning. △ Less

Submitted 14 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2203.11214 [pdf, ps, other]

Status on the DESIR High Resolution Separator Commissioning

Authors: J. Michaud, P. Alfaurt, A. Balana, B. Blank, L. Daudin, T. Kurtukian Nieto, B. Lachacinski, L. Serani, F. Varenne

Abstract: Many nuclear reactions used to create radioactive isotopes for nuclear research produce, in addition to the isotope of interest, many contaminants, which are often produced in much larger amounts than the isotope of interest. Many installations using the ISOL approach are therefore equipped with high-resolution mass separators to remove at least isotopes with a different mass number. In the presen… ▽ More Many nuclear reactions used to create radioactive isotopes for nuclear research produce, in addition to the isotope of interest, many contaminants, which are often produced in much larger amounts than the isotope of interest. Many installations using the ISOL approach are therefore equipped with high-resolution mass separators to remove at least isotopes with a different mass number. In the present paper, we present the results of the commissioning of the DESIR HRS presently under development at LP2I Bordeaux (formerly CENBG). Optical aberrations are corrected up to 3rd order and a mass resolution of M/$Δ$M of 25000 is reached with a transmission of about 70% for a 133Cs+ beam at 25 keV. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2012.05862 [pdf, other]

Understanding Learned Reward Functions

Authors: Eric J. Michaud, Adam Gleave, Stuart Russell

Abstract: In many real-world tasks, it is not possible to procedurally specify an RL agent's reward function. In such cases, a reward function must instead be learned from interacting with and observing humans. However, current techniques for reward learning may fail to produce reward functions which accurately reflect user preferences. Absent significant advances in reward learning, it is thus important to… ▽ More In many real-world tasks, it is not possible to procedurally specify an RL agent's reward function. In such cases, a reward function must instead be learned from interacting with and observing humans. However, current techniques for reward learning may fail to produce reward functions which accurately reflect user preferences. Absent significant advances in reward learning, it is thus important to be able to audit learned reward functions to verify whether they truly capture user preferences. In this paper, we investigate techniques for interpreting learned reward functions. In particular, we apply saliency methods to identify failure modes and predict the robustness of reward functions. We find that learned reward functions often implement surprising algorithms that rely on contingent aspects of the environment. We also discover that existing interpretability techniques often attend to irrelevant changes in reward output, suggesting that reward interpretability may need significantly different methods from policy interpretability. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: Presented at Deep RL Workshop, NeurIPS 2020

arXiv:2010.13871 [pdf, other]

Examining the causal structures of deep neural networks using information theory

Authors: Simon Mattsson, Eric J. Michaud, Erik Hoel

Abstract: Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses t… ▽ More Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN's causal structure since it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the "causal plane" which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 14 pages, 8 figures

arXiv:2009.12689 [pdf, other]

Lunar Opportunities for SETI

Authors: Eric J. Michaud, Andrew P. V. Siemion, Jamie Drew, S. Pete Worden

Abstract: A radio telescope placed in lunar orbit, or on the surface of the Moon's farside, could be of great value to the Search for Extraterrestrial Intelligence (SETI). The advantage of such a telescope is that it would be shielded by the body of the Moon from terrestrial sources of radio frequency interference (RFI). While RFI can be identified and ignored by other fields of radio astronomy, the possibl… ▽ More A radio telescope placed in lunar orbit, or on the surface of the Moon's farside, could be of great value to the Search for Extraterrestrial Intelligence (SETI). The advantage of such a telescope is that it would be shielded by the body of the Moon from terrestrial sources of radio frequency interference (RFI). While RFI can be identified and ignored by other fields of radio astronomy, the possible spectral similarity between human and alien-generated radio emission makes the abundance of artificial radio emission on and around the Earth a significant complicating factor for SETI. A Moon-based telescope would avoid this challenge. In this paper, we review existing literature on Moon-based radio astronomy, discuss the benefits of lunar SETI, contrast possible surface- and orbit-based telescope designs, and argue that such initiatives are scientifically feasible, both technically and financially, within the next decade. △ Less

Submitted 26 September, 2020; originally announced September 2020.

Comments: 7 pages, submitted as a white paper for the National Academy of Sciences Planetary Science and Astrobiology Decadal Survey 2023-2032

arXiv:1912.07881 [pdf, other]

doi 10.23731/CYRM-2021-003

Storage Ring to Search for Electric Dipole Moments of Charged Particles -- Feasibility Study

Authors: F. Abusaif, A. Aggarwal, A. Aksentev, B. Alberdi-Esuain, A. Andres, A. Atanasov, L. Barion, S. Basile, M. Berz, C. Böhme, J. Böker, J. Borburgh, N. Canale, C. Carli, I. Ciepał, G. Ciullo, M. Contalbrigo, J. -M. De Conto, S. Dymov, O. Felden, M. Gaisser, R. Gebel, N. Giese, J. Gooding, K. Grigoryev , et al. (76 additional authors not shown)

Abstract: The proposed method exploits charged particles confined as a storage ring beam (proton, deuteron, possibly $^3$He) to search for an intrinsic electric dipole moment (EDM) aligned along the particle spin axis. Statistical sensitivities could approach 10$^{-29}$ e$\cdot$cm. The challenge will be to reduce systematic errors to similar levels. The ring will be adjusted to preserve the spin polarisatio… ▽ More The proposed method exploits charged particles confined as a storage ring beam (proton, deuteron, possibly $^3$He) to search for an intrinsic electric dipole moment (EDM) aligned along the particle spin axis. Statistical sensitivities could approach 10$^{-29}$ e$\cdot$cm. The challenge will be to reduce systematic errors to similar levels. The ring will be adjusted to preserve the spin polarisation, initially parallel to the particle velocity, for times in excess of 15 minutes. Large radial electric fields, acting through the EDM, will rotate the polarisation from the longitudinal to the vertical direction. The slow rise in the vertical polarisation component, detected through scattering from a target, signals the EDM. The project strategy is outlined. A stepwise plan is foreseen, starting with ongoing COSY activities that demonstrate technical feasibility. Achievements to date include reduced polarization measurement errors, long horizontal plane polarization lifetimes, and control of the polarization direction through feedback from scattering measurements. The project continues with a proof-of-capability measurement (precursor experiment; first direct deuteron EDM measurement), an intermediate prototype ring (proof-of-principle; demonstrator for key technologies), and finally a high-precision electric-field storage ring. △ Less

Submitted 25 June, 2021; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: 243 pages

Report number: CERN Yellow Reports: Monographs, CERN-2021-003

arXiv:1812.08535 [pdf, other]

Feasibility Study for an EDM Storage Ring

Authors: F. Abusaif, A. Aggarwal, A. Aksentev, B. Alberdi-Esuain, L. Barion, S. Basile, M. Berz, M. Beyß, C. Böhme, J. Böker, J. Borburgh, C. Carli, I. Ciepał, G. Ciullo, M. Contalbrigo, J. -M. De Conto, S. Dymov, R. Engels, O. Felden, M. Gagoshidze, M. Gaisser, R. Gebel, N. Giese, K. Grigoryev, D. Grzonka , et al. (70 additional authors not shown)

Abstract: This project exploits charged particles confined as a storage ring beam (proton, deuteron, possibly $^3$He) to search for an intrinsic electric dipole moment (EDM, $\vec d$) aligned along the particle spin axis. Statistical sensitivities can approach $10^{-29}$~e$\cdot$cm. The challenge will be to reduce systematic errors to similar levels. The ring will be adjusted to preserve the spin polarizati… ▽ More This project exploits charged particles confined as a storage ring beam (proton, deuteron, possibly $^3$He) to search for an intrinsic electric dipole moment (EDM, $\vec d$) aligned along the particle spin axis. Statistical sensitivities can approach $10^{-29}$~e$\cdot$cm. The challenge will be to reduce systematic errors to similar levels. The ring will be adjusted to preserve the spin polarization, initially parallel to the particle velocity, for times in excess of 15 minutes. Large radial electric fields, acting through the EDM, will rotate the polarization ($\vec d \times\vec E$). The slow rise in the vertical polarization component, detected through scattering from a target, signals the EDM. The project strategy is outlined. It foresees a step-wise plan, starting with ongoing COSY activities that demonstrate technical feasibility. Achievements to date include reduced polarization measurement errors, long horizontal-plane polarization lifetimes, and control of the polarization direction through feedback from the scattering measurements. The project continues with a proof-of-capability measurement (precursor experiment; first direct deuteron EDM measurement), an intermediate prototype ring (proof-of-principle; demonstrator for key technologies), and finally the high precision electric-field storage ring. △ Less

Submitted 18 January, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

arXiv:1801.08819 [pdf, ps, other]

doi 10.1103/PhysRevE.97.062313

Social Influence with Recurrent Mobility with multiple options

Authors: Jérôme Michaud, Attila Szilva

Abstract: In this paper, we discuss the possible generalizations of the Social Influence with Recurrent Mobility (SIRM) model developed in Phys. Rev. Lett. 112, 158701 (2014). Although the SIRM model worked approximately satisfying when US election was modelled, it has its limits: it has been developed only for two-party systems and can lead to unphysical behaviour when one of the parties has extreme vote s… ▽ More In this paper, we discuss the possible generalizations of the Social Influence with Recurrent Mobility (SIRM) model developed in Phys. Rev. Lett. 112, 158701 (2014). Although the SIRM model worked approximately satisfying when US election was modelled, it has its limits: it has been developed only for two-party systems and can lead to unphysical behaviour when one of the parties has extreme vote share close to 0 or 1. We propose here generalizations to the SIRM model by its extension for multi-party systems that are mathematically well-posed in case of extreme vote shares, too, by handling the noise term in a different way. In addition, we show that our method opens new applications for the study of elections by using a new calibration procedure, and makes possible to analyse the influence of the "free will" (creating a new party) and other local effects for different commuting network topologies. △ Less

Submitted 26 January, 2018; originally announced January 2018.

Comments: 10 pages, 6 figures

Journal ref: Phys. Rev. E 97, 062313 (2018)

arXiv:1606.08433 [pdf, ps, other]

doi 10.1103/PhysRevE.95.022308

Continuous time limits of the Utterance Selection Model

Authors: Jérôme Michaud

Abstract: In this paper, we derive new continuous time limits of the Utterance Selection Model (USM) for language change (Baxter et al., Phys. Rev. E {\bf 73}, 046118, 2006). This is motivated by the fact that the Fokker-Planck continuous time limit derived in the original version of the USM is only valid for a small range of parameters. We investigate the consequences of relaxing these constraints on param… ▽ More In this paper, we derive new continuous time limits of the Utterance Selection Model (USM) for language change (Baxter et al., Phys. Rev. E {\bf 73}, 046118, 2006). This is motivated by the fact that the Fokker-Planck continuous time limit derived in the original version of the USM is only valid for a small range of parameters. We investigate the consequences of relaxing these constraints on parameters. Using the normal approximation of the multinomial approximation, we derive a new continuous time limit of the USM in the form of a weak-noise stochastic differential equation. We argue that this weak noise, not captured by the Kramers-Moyal expansion, can not be neglected. We then propose a coarse-graining procedure, which takes the form of a stochastic version of the \emph{heterogeneous mean field} approximation. This approximation groups the behaviour of nodes of same degree, reducing the complexity of the problem. With the help of this approximation, we study in detail two simple families of networks: the regular networks and the star-shaped networks. The analysis reveals and quantifies a finite size effect of the dynamics. If we increase the size of the network by kee** all the other parameters constant, we transition from a state where conventions emerge to a state when no convention emerges. Furthermore, we show that the degree of a node acts as a time scale. For heterogeneous networks such as star-shaped networks, the time scale difference can become very large leading to a noisier behaviour of highly connected nodes. △ Less

Submitted 5 January, 2017; v1 submitted 27 June, 2016; originally announced June 2016.

Comments: 21 pages, 10 figures, accepted for publication Physical Review E

arXiv:1606.04020 [pdf, other]

The IDSA and the homogeneous sphere: Issues and possible improvements

Authors: Jérôme Michaud

Abstract: In this paper, we are concerned with the study of the Isotropic Diffusion Source Approximation (IDSA) (Baxter et al., Phys. Rev. E 73, 046118, 2006) of radiative transfer. After having recalled well-known limits of the radiative transfer equation, we present the IDSA and adapt it to the case of the homogeneous sphere. We then show that for this example the IDSA suffers from severe numerical diffic… ▽ More In this paper, we are concerned with the study of the Isotropic Diffusion Source Approximation (IDSA) (Baxter et al., Phys. Rev. E 73, 046118, 2006) of radiative transfer. After having recalled well-known limits of the radiative transfer equation, we present the IDSA and adapt it to the case of the homogeneous sphere. We then show that for this example the IDSA suffers from severe numerical difficulties. We argue that these difficulties originate in the min-max switch coupling mechanism used in the IDSA. To overcome this problem we reformulate the IDSA to avoid the problematic coupling. This allows us to access the modeling error of the IDSA for the homogeneous sphere test case. The IDSA is shown to overestimate the streaming component, hence we propose a new version of the IDSA which is numerically shown to be more accurate than the old one. Analytical results and numerical tests are provided to support the accuracy of the new proposed approximation. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Comments: 25 pages, 8 figures, accepted for publication in DCDS-S

MSC Class: 65Z05; 35B40; 35Q85; 85A25; 41A25

arXiv:1212.1623 [pdf, ps, other]

Derivation of the Isotropic Diffusion Source Approximation (IDSA) for Supernova Neutrino Transport by Asymptotic Expansions

Authors: Heiko Berninger, Emmanuel Frenod, Martin Gander, Mathias Liebendorfer, Jerome Michaud

Abstract: We present Chapman--Enskog and Hilbert expansions applied to the $\BigO(v/c)$ Boltzmann equation for the radiative transfer of neutrinos in core-collapse supernovae. Based on the Legendre expansion of the scattering kernel for the collision integral truncated after the second term, we derive the diffusion limit for the Boltzmann equation by truncation of Chapman--Enskog or Hilbert expansions with… ▽ More We present Chapman--Enskog and Hilbert expansions applied to the $\BigO(v/c)$ Boltzmann equation for the radiative transfer of neutrinos in core-collapse supernovae. Based on the Legendre expansion of the scattering kernel for the collision integral truncated after the second term, we derive the diffusion limit for the Boltzmann equation by truncation of Chapman--Enskog or Hilbert expansions with reaction and collision scaling. We also give asymptotically sharp results obtained by the use of an additional time scaling. The diffusion limit determines the diffusion source in the \emph{Isotropic Diffusion Source Approximation (IDSA)} of Boltzmann's equation for which the free streaming limit and the reaction limit serve as limiters. Here, we derive the reaction limit as well as the free streaming limit by truncation of Chapman--Enskog or Hilbert expansions using reaction and collision scaling as well as time scaling, respectively. Finally, we motivate why limiters are a good choice for the definition of the source term in the IDSA. △ Less

Submitted 6 August, 2013; v1 submitted 7 December, 2012; originally announced December 2012.

Comments: SIAM Journal on Mathematical Analysis (2013) 0000-00000

arXiv:1211.6901 [pdf, other]

A Mathematical Description of the IDSA for Supernova Neutrino transport, its discretization and a comparison with a finite volume scheme for Boltzmann's Equation

Authors: Heiko Berninger, Emmanuel Frenod, Martin Gander, Mathias Liebendörfer, Jérôme Michaud, Nicolas Vasset

Abstract: In this paper we give an introduction to the Boltzmann equation for neutrino transport used in core collapse supernova models as well as a detailed mathematical description of the \emph{Isotropic Diffusion Source Approximation} (IDSA). Furthermore, we present a numerical treatment of a reduced Boltzmann model problem based on time splitting and finite volumes and revise the discretization of the I… ▽ More In this paper we give an introduction to the Boltzmann equation for neutrino transport used in core collapse supernova models as well as a detailed mathematical description of the \emph{Isotropic Diffusion Source Approximation} (IDSA). Furthermore, we present a numerical treatment of a reduced Boltzmann model problem based on time splitting and finite volumes and revise the discretization of the IDSA for this problem. Discretization error studies carried out on the reduced Boltzmann model problem and on the IDSA show that the errors are of order one in both cases. By a numerical example, a detailed comparison of the reduced model and the IDSA is carried out and interpreted. For this example the IDSA modeling error with respect to the reduced Boltzmann model is numerically determined and localized. △ Less

Submitted 29 November, 2012; originally announced November 2012.

arXiv:1108.5354 [pdf, other]

doi 10.1088/1475-7516/2012/01/009

The Kolmogorov-Smirnov test for the CMB

Authors: Mona Frommert, Ruth Durrer, Jérôme Michaud

Abstract: We investigate the statistics of the cosmic microwave background using the Kolmogorov-Smirnov test. We show that, when we correctly de-correlate the data, the partition function of the Kolmogorov stochasticity parameter is compatible with the Kolmogorov distribution and, contrary to previous claims, the CMB data are compatible with Gaussian fluctuations with the correlation function given by stand… ▽ More We investigate the statistics of the cosmic microwave background using the Kolmogorov-Smirnov test. We show that, when we correctly de-correlate the data, the partition function of the Kolmogorov stochasticity parameter is compatible with the Kolmogorov distribution and, contrary to previous claims, the CMB data are compatible with Gaussian fluctuations with the correlation function given by standard Lambda-CDM. We then use the Kolmogorov-Smirnov test to derive upper bounds on residual point source power in the CMB, and indicate the promise of this statistics for further datasets, especially Planck, to search for deviations from Gaussianity and for detecting point sources and Galactic foregrounds. △ Less

Submitted 1 December, 2011; v1 submitted 26 August, 2011; originally announced August 2011.

Comments: Improved significance of the results (which remain unchanged) by using patches instead of ring segments in the analysis. Added sky maps of the Kolmogorov-parameter for original and de-correlated CMB map

Journal ref: JCAP 1201 (2012) 009

arXiv:astro-ph/0610062 [pdf, ps, other]

doi 10.1117/12.672261

DUNE: The Dark Universe Explorer

Authors: A. Refregier, O. Boulade, Y. Mellier, B. Milliard, R. Pain, J. Michaud, F. Safa, A. Amara, P. Astier, E. Barrelet, E. Bertin, S. Boulade, C. Cara, A. Claret, L. Georges, R. Grange, J. Guy, C. Koeck, L. Kroely, C. Magneville, N. Palanque-Delabrouille, N. Regnault, G. Smadja, C. Schimd, Z. Sun

Abstract: Understanding the nature of Dark Matter and Dark Energy is one of the most pressing issues in cosmology and fundamental physics. The purpose of the DUNE (Dark UNiverse Explorer) mission is to study these two cosmological components with high precision, using a space-based weak lensing survey as its primary science driver. Weak lensing provides a measure of the distribution of dark matter in the… ▽ More Understanding the nature of Dark Matter and Dark Energy is one of the most pressing issues in cosmology and fundamental physics. The purpose of the DUNE (Dark UNiverse Explorer) mission is to study these two cosmological components with high precision, using a space-based weak lensing survey as its primary science driver. Weak lensing provides a measure of the distribution of dark matter in the universe and of the impact of dark energy on the growth of structures. DUNE will also include a complementary supernovae survey to measure the expansion history of the universe, thus giving independent additional constraints on dark energy. The baseline concept consists of a 1.2m telescope with a 0.5 square degree optical CCD camera. It is designed to be fast with reduced risks and costs, and to take advantage of the synergy between ground-based and space observations. Stringent requirements for weak lensing systematics were shown to be achievable with the baseline concept. This will allow DUNE to place strong constraints on cosmological parameters, including the equation of state parameter of the dark energy and its evolution from redshift 0 to 1. DUNE is the subject of an ongoing study led by the French Space Agency (CNES), and is being proposed for ESA's Cosmic Vision programme. △ Less

Submitted 3 October, 2006; originally announced October 2006.

Comments: 12 latex pages, including 7 figures and 2 tables. Procs. of SPIE symposium "Astronomical Telescopes and Instrumentation", Orlando, may 2006

Showing 1–27 of 27 results for author: Michaud, J