Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2305.01957 [pdf, other]

NorQuAD: Norwegian Question Answering Dataset

Authors: Sardana Ivanova, Fredrik Aas Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid

Abstract: In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human per… ▽ More In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: Accepted to NoDaLiDa 2023

arXiv:2207.04901 [pdf, other]

Exploring Length Generalization in Large Language Models

Authors: Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Abstract: The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring th… ▽ More The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equip** language models with the ability to generalize to longer problems. △ Less

Submitted 14 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.14858 [pdf, other]

Solving Quantitative Reasoning Problems with Language Models

Authors: Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Abstract: Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained o… ▽ More Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them. △ Less

Submitted 30 June, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: 12 pages, 5 figures + references and appendices

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2112.00114 [pdf, other]

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Authors: Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

Abstract: Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even… ▽ More Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even in the few-shot regime -- when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations. △ Less

Submitted 30 November, 2021; originally announced December 2021.

arXiv:2106.15831 [pdf, other]

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

Authors: Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs

Abstract: Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models. Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" and are exceedingly rare. Ident… ▽ More Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models. Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" and are exceedingly rare. Identifying such models, and understanding their properties, is key to improving out-of-distribution performance. We conduct a thorough empirical investigation of effective robustness during fine-tuning and surprisingly find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence. We study how properties of the data influence effective robustness, and we show that it increases with the larger size, more diversity, and higher example difficulty of the dataset. We also find that models that display effective robustness are able to correctly classify 10% of the examples that no other current testbed model gets correct. Finally, we discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 27 pages, 25 figures

arXiv:2105.04448 [pdf, other]

Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

Authors: Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Adi Suresh, Jesse Thaler

Abstract: A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OmniFold. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model… ▽ More A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OmniFold. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model parameter estimation, the goal of deconvolution is to remove detector distortions in order to enable a variety of down-stream inference tasks. Our approach is the deep learning generalization of the common Richardson-Lucy approach that is also called Iterative Bayesian Unfolding in particle physics. We show how OmniFold can not only remove detector distortions, but it can also account for noise processes and acceptance effects. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 6 pages, 1 figure, 1 table

Journal ref: ICLR simDL workshop 2021 (https://simdl.github.io/files/12.pdf)

arXiv:2101.08320 [pdf, other]

doi 10.1088/1361-6633/ac36b9

The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics

Authors: Gregor Kasieczka, Benjamin Nachman, David Shih, Oz Amram, Anders Andreassen, Kees Benkendorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H. Collins, Biwei Dai, Felipe F. De Freitas, Barry M. Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D. A. Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F. Kamenik, Charanjit K. Khosa, Patrick Komiske, Luc Le Pottier , et al. (22 additional authors not shown)

Abstract: A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a… ▽ More A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: 108 pages, 53 figures, 3 tables

arXiv:2010.15775 [pdf, other]

Understanding the Failure Modes of Out-of-Distribution Generalization

Authors: Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur

Abstract: Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expe… ▽ More Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expect these models to succeed. In particular, through a theoretical study of gradient-descent-trained linear classifiers on some easy-to-learn tasks, we uncover two complementary failure modes. These modes arise from how spurious correlations induce two kinds of skews in the data: one geometric in nature, and another, statistical in nature. Finally, we construct natural modifications of image classification datasets to understand when these failure modes can arise in practice. We also design experiments to isolate the two failure modes when training modern neural networks on these datasets. △ Less

Submitted 29 April, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2010.03569 [pdf, other]

doi 10.1103/PhysRevD.103.036001

Parameter Estimation using Neural Networks in the Presence of Detector Effects

Authors: Anders Andreassen, Shih-Chieh Hsu, Benjamin Nachman, Natchanon Suaysom, Adi Suresh

Abstract: Histogram-based template fits are the main technique used for estimating parameters of high energy physics Monte Carlo generators. Parametrized neural network reweighting can be used to extend this fitting procedure to many dimensions and does not require binning. If the fit is to be performed using reconstructed data, then expensive detector simulations must be used for training the neural networ… ▽ More Histogram-based template fits are the main technique used for estimating parameters of high energy physics Monte Carlo generators. Parametrized neural network reweighting can be used to extend this fitting procedure to many dimensions and does not require binning. If the fit is to be performed using reconstructed data, then expensive detector simulations must be used for training the neural networks. We introduce a new two-level fitting approach that only requires one dataset with detector simulation and then a set of additional generation-level datasets without detector effects included. This Simulation-level fit based on Reweighting Generator-level events with Neural networks (SRGN) is demonstrated using simulated datasets for a variety of examples including a simple Gaussian random variable, parton shower tuning, and the top quark mass extraction. △ Less

Submitted 6 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: 16 pages, 13 figures, 4 tables; v3: Updated to journal version

Journal ref: Phys. Rev. D 103, 036001 (2021)

arXiv:2008.08675 [pdf, other]

Asymptotics of Wide Convolutional Neural Networks

Authors: Anders Andreassen, Ethan Dyer

Abstract: Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptoti… ▽ More Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptotic width dependence for many quantities of interest. These scaling relationships provide a solvable description for the training dynamics of wide convolutional networks. We test these relations across a broad range of architectures. In particular, we find that the difference in performance between finite and infinite width models vanishes at a definite rate with respect to model width. Nonetheless, this relation is consistent with finite width models generalizing either better or worse than their infinite width counterparts, and we provide examples where the relative performance depends on the optimization details. △ Less

Submitted 19 August, 2020; originally announced August 2020.

Comments: 23 pages, 12 figures

arXiv:2005.12055 [pdf, other]

Hierarchical Bayesian Regression for Multi-Site Normative Modeling of Neuroimaging Data

Authors: Seyed Mostafa Kia, Hester Huijsdens, Richard Dinga, Thomas Wolfers, Maarten Mennes, Ole A. Andreassen, Lars T. Westlye, Christian F. Beckmann, Andre F. Marquand

Abstract: Clinical neuroimaging has recently witnessed explosive growth in data availability which brings studying heterogeneity in clinical cohorts to the spotlight. Normative modeling is an emerging statistical tool for achieving this objective. However, its application remains technically challenging due to difficulties in properly dealing with nuisance variation, for example due to variability in image… ▽ More Clinical neuroimaging has recently witnessed explosive growth in data availability which brings studying heterogeneity in clinical cohorts to the spotlight. Normative modeling is an emerging statistical tool for achieving this objective. However, its application remains technically challenging due to difficulties in properly dealing with nuisance variation, for example due to variability in image acquisition devices. Here, in a fully probabilistic framework, we propose an application of hierarchical Bayesian regression (HBR) for multi-site normative modeling. Our experimental results confirm the superiority of HBR in deriving more accurate normative ranges on large multi-site neuroimaging data compared to widely used methods. This provides the possibility i) to learn the normative range of structural and functional brain measures on large multi-site data; ii) to recalibrate and reuse the learned model on local small data; therefore, HBR closes the technical loop for applying normative modeling as a medical tool for the diagnosis and prognosis of mental disorders. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: To be published in MICCAI 2020 proceedings

arXiv:2001.05001 [pdf, other]

doi 10.1103/PhysRevD.101.095004

Simulation Assisted Likelihood-free Anomaly Detection

Authors: Anders Andreassen, Benjamin Nachman, David Shih

Abstract: Given the lack of evidence for new particle discoveries at the Large Hadron Collider (LHC), it is critical to broaden the search program. A variety of model-independent searches have been proposed, adding sensitivity to unexpected signals. There are generally two types of such searches: those that rely heavily on simulations and those that are entirely based on (unlabeled) data. This paper introdu… ▽ More Given the lack of evidence for new particle discoveries at the Large Hadron Collider (LHC), it is critical to broaden the search program. A variety of model-independent searches have been proposed, adding sensitivity to unexpected signals. There are generally two types of such searches: those that rely heavily on simulations and those that are entirely based on (unlabeled) data. This paper introduces a hybrid method that makes the best of both approaches. For potential signals that are resonant in one known feature, this new method first learns a parameterized reweighting function to morph a given simulation to match the data in sidebands. This function is then interpolated into the signal region and then the reweighted background-only simulation can be used for supervised learning as well as for background estimation. The background estimation from the reweighted simulation allows for non-trivial correlations between features used for classification and the resonant feature. A dijet search with jet substructure is used to illustrate the new method. Future applications of Simulation Assisted Likelihood-free Anomaly Detection (SALAD) include a variety of final states and potential combinations with other model-independent approaches. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 19 pages, 9 figures

Journal ref: Phys. Rev. D 101, 095004 (2020)

arXiv:1911.09107 [pdf, other]

doi 10.1103/PhysRevLett.124.182001

OmniFold: A Method to Simultaneously Unfold All Observables

Authors: Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Jesse Thaler

Abstract: Collider data must be corrected for detector effects ("unfolded") to be compared with many theoretical calculations and measurements from other experiments. Unfolding is traditionally done for individual, binned observables without including all information relevant for characterizing the detector response. We introduce OmniFold, an unfolding method that iteratively reweights a simulated dataset,… ▽ More Collider data must be corrected for detector effects ("unfolded") to be compared with many theoretical calculations and measurements from other experiments. Unfolding is traditionally done for individual, binned observables without including all information relevant for characterizing the detector response. We introduce OmniFold, an unfolding method that iteratively reweights a simulated dataset, using machine learning to capitalize on all available information. Our approach is unbinned, works for arbitrarily high-dimensional data, and naturally incorporates information from the full phase space. We illustrate this technique on a realistic jet substructure example from the Large Hadron Collider and compare it to standard binned unfolding methods. This new paradigm enables the simultaneous measurement of all observables, including those not yet invented at the time of the analysis. △ Less

Submitted 16 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: 8 pages, 3 figures, 1 table, 1 poem; v2: updated to approximate PRL version

Report number: MIT-CTP 5155

Journal ref: Phys. Rev. Lett. 124, 182001 (2020)

arXiv:1907.08209 [pdf, other]

doi 10.1103/PhysRevD.101.091901

Neural Networks for Full Phase-space Reweighting and Parameter Tuning

Authors: Anders Andreassen, Benjamin Nachman

Abstract: Precise scientific analysis in collider-based particle physics is possible because of complex simulations that connect fundamental theories to observable quantities. The significant computational cost of these programs limits the scope, precision, and accuracy of Standard Model measurements and searches for new phenomena. We therefore introduce Deep neural networks using Classification for Tuning… ▽ More Precise scientific analysis in collider-based particle physics is possible because of complex simulations that connect fundamental theories to observable quantities. The significant computational cost of these programs limits the scope, precision, and accuracy of Standard Model measurements and searches for new phenomena. We therefore introduce Deep neural networks using Classification for Tuning and Reweighting (DCTR), a neural network-based approach to reweight and fit simulations using all kinematic and flavor information -- the full phase space. DCTR can perform tasks that are currently not possible with existing methods, such as estimating non-perturbative fragmentation uncertainties. The core idea behind the new approach is to exploit powerful high-dimensional classifiers to reweight phase space as well as to identify the best parameters for describing data. Numerical examples from $e^+e^-\rightarrow\text{jets}$ demonstrate the fidelity of these methods for simulation parameters that have a big and broad impact on phase space as well as those that have a minimal and/or localized impact. The high fidelity of the full phase-space reweighting enables a new paradigm for simulations, parameter tuning, and model systematic uncertainties across particle physics and possibly beyond. △ Less

Submitted 26 August, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

Comments: 7 pages, 3 figures; v2 has updated citations and clarifications; v3 has a new appendix with an alternative fitting method

Journal ref: Phys. Rev. D 101, 091901 (2020)

arXiv:1906.10137 [pdf, other]

doi 10.1103/PhysRevLett.123.182001

Binary JUNIPR: an interpretable probabilistic model for discrimination

Authors: Anders Andreassen, Ilya Feige, Christopher Frye, Matthew D. Schwartz

Abstract: JUNIPR is an approach to unsupervised learning in particle physics that scaffolds a probabilistic model for jets around their representation as binary trees. Separate JUNIPR models can be learned for different event or jet types, then compared and explored for physical insight. The relative probabilities can also be used for discrimination. In this paper, we show how the training of the separate m… ▽ More JUNIPR is an approach to unsupervised learning in particle physics that scaffolds a probabilistic model for jets around their representation as binary trees. Separate JUNIPR models can be learned for different event or jet types, then compared and explored for physical insight. The relative probabilities can also be used for discrimination. In this paper, we show how the training of the separate models can be refined in the context of classification to optimize discrimination power. We refer to this refined approach as Binary JUNIPR. Binary JUNIPR achieves state-of-the-art performance for quark/gluon discrimination and top-tagging. The trained models can then be analyzed to provide physical insight into how the classification is achieved. As examples, we explore differences between quark and gluon jets and between gluon jets generated with two different simulations. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 6 pages, 3 figures

Journal ref: Phys. Rev. Lett. 123, 182001 (2019)

arXiv:1808.10315 [pdf, other]

Deep Learning for Quality Control of Subcortical Brain 3D Shape Models

Authors: Dmitry Petrov, Boris A. Gutman Egor Kuznetsov, Theo G. M. van Erp, Jessica A. Turner, Lianne Schmaal, Dick Veltman, Lei Wang, Kathryn Alpert, Dmitry Isaev, Artemis Zavaliangos-Petropulu, Christopher R. K. Ching, Vince Calhoun, David Glahn, Theodore D. Satterthwaite, Ole Andreas Andreassen, Stefan Borgwardt, Fleur Howells, Nynke Groenewold, Aristotle Voineskos, Joaquim Radua, Steven G. Potkin, Benedicto Crespo-Facorro, Diana Tordesillas-Gutierrez, Li Shen, Irina Lebedeva , et al. (48 additional authors not shown)

Abstract: We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present a… ▽ More We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present an approach for model decision visualization, allowing human raters to see the areas of subcortical shapes most likely to be deemed of failing quality by the machine. Our training data is comprised of 5200 subjects from the ENIGMA Schizophrenia MRI cohorts, and our test dataset contains 1500 subjects from the ENIGMA Major Depressive Disorder cohorts. Our final models reduce human rater time by 46-70%. ResNet outperforms VGGNet and Inception for all of our predictive tasks. △ Less

Submitted 4 September, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

Comments: Accepted to Shape in Medical Imaging (ShapeMI) workshop at MICCAI 2018. arXiv admin note: substantial text overlap with arXiv:1707.06353

arXiv:1804.09720 [pdf, other]

doi 10.1140/epjc/s10052-019-6607-9

JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics

Authors: Anders Andreassen, Ilya Feige, Christopher Frye, Matthew D. Schwartz

Abstract: In applications of machine learning to particle physics, a persistent challenge is how to go beyond discrimination to learn about the underlying physics. To this end, a powerful tool would be a framework for unsupervised learning, where the machine learns the intricate high-dimensional contours of the data upon which it is trained, without reference to pre-established labels. In order to approach… ▽ More In applications of machine learning to particle physics, a persistent challenge is how to go beyond discrimination to learn about the underlying physics. To this end, a powerful tool would be a framework for unsupervised learning, where the machine learns the intricate high-dimensional contours of the data upon which it is trained, without reference to pre-established labels. In order to approach such a complex task, an unsupervised network must be structured intelligently, based on a qualitative understanding of the data. In this paper, we scaffold the neural network's architecture around a leading-order model of the physics underlying the data. In addition to making unsupervised learning tractable, this design actually alleviates existing tensions between performance and interpretability. We call the framework JUNIPR: "Jets from UNsupervised Interpretable PRobabilistic models". In this approach, the set of particle momenta composing a jet are clustered into a binary tree that the neural network examines sequentially. Training is unsupervised and unrestricted: the network could decide that the data bears little correspondence to the chosen tree structure. However, when there is a correspondence, the network's output along the tree has a direct physical interpretation. JUNIPR models can perform discrimination tasks, through the statistically optimal likelihood-ratio test, and they permit visualizations of discrimination power at each branching in a jet's tree. Additionally, JUNIPR models provide a probability distribution from which events can be drawn, providing a data-driven Monte Carlo generator. As a third application, JUNIPR models can reweight events from one (e.g. simulated) data set to agree with distributions from another (e.g. experimental) data set. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Comments: 37 pages, 24 figures

arXiv:1707.08124 [pdf, other]

doi 10.1103/PhysRevD.97.056006

Scale Invariant Instantons and the Complete Lifetime of the Standard Model

Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

Abstract: In a classically scale-invariant quantum field theory, tunneling rates are infrared divergent due to the existence of instantons of any size. While one expects such divergences to be resolved by quantum effects, it has been unclear how higher-loop corrections can resolve a problem appearing already at one loop. With a careful power counting, we uncover a series of loop contributions that dominate… ▽ More In a classically scale-invariant quantum field theory, tunneling rates are infrared divergent due to the existence of instantons of any size. While one expects such divergences to be resolved by quantum effects, it has been unclear how higher-loop corrections can resolve a problem appearing already at one loop. With a careful power counting, we uncover a series of loop contributions that dominate over the one-loop result and sum all the necessary terms. We also clarify previously incomplete treatments of related issues pertaining to global symmetries, gauge fixing and finite mass effects. In addition, we produce exact closed-form solutions for the functional determinants over scalars, fermions and vector bosons around the scale-invariant bounce, demonstrating manifest gauge invariance in the vector case. With these problems solved, we produce the first complete calculation of the lifetime of our universe: 10^139 years. With 95% confidence, we expect our universe to last more than 10^58 years. The uncertainty is part experimental uncertainty on the top quark mass and on $αs$ and part theory uncertainty from electroweak threshold corrections. Using our complete result, we provide phase diagrams in the $mt/mh$ and the $mt/αs$ planes, with uncertainty bands. To rule out absolute stability to $3σ$ confidence, the uncertainty on the top quark pole mass would have to be pushed below 250 MeV or the uncertainty on $αs(mZ)$ pushed below 0.00025. △ Less

Submitted 2 May, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

Comments: Typos corrected, numbers updated

Journal ref: Phys. Rev. D 97, 056006 (2018)

arXiv:1705.07135 [pdf, other]

doi 10.1007/JHEP10(2017)151

Reducing the Top Quark Mass Uncertainty with Jet Grooming

Authors: Anders Andreassen, Matthew D. Schwartz

Abstract: The measurement of the top quark mass has large systematic uncertainties coming from the Monte Carlo simulations that are used to match theory and experiment. We explore how much that uncertainty can be reduced by using jet grooming procedures. We estimate the inherent ambiguity in what is meant by Monte Carlo mass to be around 530 MeV without any corrections. This uncertainty can be reduced by 60… ▽ More The measurement of the top quark mass has large systematic uncertainties coming from the Monte Carlo simulations that are used to match theory and experiment. We explore how much that uncertainty can be reduced by using jet grooming procedures. We estimate the inherent ambiguity in what is meant by Monte Carlo mass to be around 530 MeV without any corrections. This uncertainty can be reduced by 60% to 200 MeV by calibrating to the W mass and a further 33% to 140 MeV by applying soft-drop jet grooming (or by 20% more to 170 MeV with trimming). At e+e- colliders, the associated uncertainty is around 110 MeV, reducing to 50 MeV after calibrating to the W mass. By analyzing the tuning parameters, we conclude that the importance of jet grooming after calibrating to the W mass is to reduce sensitivity to the underlying event. △ Less

Submitted 19 May, 2017; originally announced May 2017.

Comments: 21 pages, 7 figures

arXiv:1612.04535 [pdf, other]

Is the familywise error rate in genomics controlled by methods based on the effective number of independent tests?

Authors: Kari Krizak Halle, Srdjan Djurovic, Ole Andreas Andreassen, Mette Langaas

Abstract: In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the con… ▽ More In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the concept of estimating an effective number of independent tests. We compare these methods using simulated data and data from the TOP study, and show that the effective number of independent tests is not additive over blocks of independent genetic markers unless we assume a common value for the local significance level. We also show that the reviewed methods based on estimating the effective number of independent tests in general do not control the familywise error rate. △ Less

Submitted 21 December, 2016; v1 submitted 14 December, 2016; originally announced December 2016.

Comments: 20 pages, 3 figures

arXiv:1604.06090 [pdf, other]

doi 10.1103/PhysRevD.95.085011

Precision decay rate calculations in quantum field theory

Authors: Anders Andreassen, David Farhi, William Frost, Matthew D. Schwartz

Abstract: Tunneling in quantum field theory is worth understanding properly, not least because it controls the long term fate of our universe. There are however, a number of features of tunneling rate calculations which lack a desirable transparency, such as the necessity of analytic continuation, the appropriateness of using an effective instead of classical potential, and the sensitivity to short-distance… ▽ More Tunneling in quantum field theory is worth understanding properly, not least because it controls the long term fate of our universe. There are however, a number of features of tunneling rate calculations which lack a desirable transparency, such as the necessity of analytic continuation, the appropriateness of using an effective instead of classical potential, and the sensitivity to short-distance physics. This paper attempts to review in pedagogical detail the physical origin of tunneling and its connection to the path integral. Both the traditional potential-deformation method and a recent more direct propagator-based method are discussed. Some new insights from using approximate semi-classical solutions are presented. In addition, we explore the sensitivity of the lifetime of our universe to short distance physics, such as quantum gravity, emphasizing a number of important subtleties. △ Less

Submitted 31 August, 2017; v1 submitted 20 April, 2016; originally announced April 2016.

Comments: 88 pages, 18 figures

Journal ref: Phys. Rev. D 95, 085011 (2017)

arXiv:1603.05938 [pdf, other]

doi 10.1111/sjos.12451

Efficient and powerful familywise error control in genome-wide association studies using generalized linear models

Authors: K. K. Halle, Ø. Bakke, S. Djurovic, A. Bye, E. Ryeng, U. Wisløff, O. A. Andreassen, M. Langaas

Abstract: In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure.… ▽ More In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure. A multivariate score statistic, which under the complete null hypothesis of no phenotype-genotype association asymptotically has a multivariate normal distribution with a covariance matrix that can be estimated from the data, is used to test a large number of genetic markers for association with the phenotype. We stress the importance of controlling the familywise error rate (FWER), and use the asymptotic distribution of the multivariate score test statistic to find a local significance level for the individual test. Using real data (from one study on schizophrenia and bipolar disorder and one on maximal oxygen uptake) and constructed correlated structures, we show that our method is a powerful alternative to the popular Bonferroni and Sidak methods. For GLMs without environmental covariates, we show that our method is an efficient alternative to permutation methods for multiple testing. Further, we show that if environmental covariates and genetic markers are uncorrelated, the estimated covariance matrix of the score test statistic can be approximated by the estimated correlation matrix for just the genetic markers. As byproducts of our method, an effective number of independent tests can be defined, and FWER-adjusted $p$-values can be calculated as an alternative to using a local significance level. △ Less

Submitted 22 December, 2016; v1 submitted 18 March, 2016; originally announced March 2016.

arXiv:1602.01102 [pdf, other]

doi 10.1103/PhysRevLett.117.231601

A direct approach to quantum tunneling

Authors: Anders Andreassen, David Farhi, William Frost, Matthew D. Schwartz

Abstract: The decay rates of quasistable states in quantum field theories are usually calculated using instanton methods. Standard derivations of these methods rely in a crucial way upon deformations and analytic continuations of the physical potential, and on the saddle point approximation. While the resulting procedure can be checked against other semi-classical approaches in some one-dimensional cases, i… ▽ More The decay rates of quasistable states in quantum field theories are usually calculated using instanton methods. Standard derivations of these methods rely in a crucial way upon deformations and analytic continuations of the physical potential, and on the saddle point approximation. While the resulting procedure can be checked against other semi-classical approaches in some one-dimensional cases, it is challenging to trace the role of the relevant physical scales, and any intuitive handle on the precision of the approximations involved are at best obscure. In this paper, we use a physical definition of the tunneling probability to derive a formula for the decay rate in both quantum mechanics and quantum field theory directly from the Minkowski path integral, without reference to unphysical deformations of the potential. There are numerous benefits to this approach, from non-perturbative applications to precision calculations and aesthetic simplicity. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: 5 pages, 2 figures

Journal ref: Phys. Rev. Lett. 117, 231601 (2016)

arXiv:1509.07258 [pdf, other]

Functional effects of schizophrenia-linked genetic variants on intrinsic single-neuron excitability: A modeling study

Authors: Tuomo Mäki-Marttunen, Geir Halnes, Anna Devor, Aree Witoelar, Francesco Bettella, Srdjan Djurovic, Yunpeng Wang, Gaute T. Einevoll, Ole A. Andreassen, Anders M. Dale

Abstract: Background: Recent genome-wide association studies (GWAS) have identified a large number of genetic risk factors for schizophrenia (SCZ) featuring ion channels and calcium transporters. For some of these risk factors, independent prior investigations have examined the effects of genetic alterations on the cellular electrical excitability and calcium homeostasis. In the present proof-of-concept stu… ▽ More Background: Recent genome-wide association studies (GWAS) have identified a large number of genetic risk factors for schizophrenia (SCZ) featuring ion channels and calcium transporters. For some of these risk factors, independent prior investigations have examined the effects of genetic alterations on the cellular electrical excitability and calcium homeostasis. In the present proof-of-concept study, we harnessed these experimental results for modeling of computational properties on layer V cortical pyramidal cell and identify possible common alterations in behavior across SCZ-related genes. Methods: We applied a biophysically detailed multi-compartmental model to study the excitability of a layer V pyramidal cell. We reviewed the literature on functional genomics for variants of genes associated with SCZ, and used changes in neuron model parameters to represent the effects of these variants. Results: We present and apply a framework for examining the effects of subtle single nucleotide polymorphisms in ion channel and Ca2+ transporter-encoding genes on neuron excitability. Our analysis indicates that most of the considered SCZ- related genetic variants affect the spiking behavior and intracellular calcium dynamics resulting from summation of inputs across the dendritic tree. Conclusions: Our results suggest that alteration in the ability of a single neuron to integrate the inputs and scale its excitability may constitute a fundamental mechanistic contributor to mental disease, alongside with the previously proposed deficits in synaptic communication and network behavior. △ Less

Submitted 24 September, 2015; originally announced September 2015.

arXiv:1408.0292 [pdf, other]

doi 10.1103/PhysRevLett.113.241801

Consistent Use of the Standard Model Effective Potential

Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

Abstract: The stability of the Standard Model is determined by the true minimum of the effective Higgs potential. We show that the potential at its minimum when computed by the traditional method is strongly dependent on the gauge parameter. It moreover depends on the scale where the potential is calculated. We provide a consistent method for determining absolute stability independent of both gauge and calc… ▽ More The stability of the Standard Model is determined by the true minimum of the effective Higgs potential. We show that the potential at its minimum when computed by the traditional method is strongly dependent on the gauge parameter. It moreover depends on the scale where the potential is calculated. We provide a consistent method for determining absolute stability independent of both gauge and calculation scale, order by order in perturbation theory. This leads to a revised stability bounds mH > (129.4 \pm 2.3) GeV and mt < (171.2 \pm 0.3)GeV. We also show how to evaluate the effect of new physics on the stability bound without resorting to unphysical field values. △ Less

Submitted 25 August, 2014; v1 submitted 1 August, 2014; originally announced August 2014.

Comments: 5 pages, 5 figures

Journal ref: Phys. Rev. Lett. 113, 241801 (2014)

arXiv:1408.0287 [pdf, other]

doi 10.1103/PhysRevD.91.016009

Consistent Use of Effective Potentials

Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

Abstract: It is well known that effective potentials can be gauge-dependent while their values at extrema should be gauge-invariant. Unfortunately, establishing this invariance in perturbation theory is not straightforward, since contributions from arbitrarily high- order loops can be of the same size. We show in massless scalar QED that an infinite class of loops can be summed (and must be summed) to give… ▽ More It is well known that effective potentials can be gauge-dependent while their values at extrema should be gauge-invariant. Unfortunately, establishing this invariance in perturbation theory is not straightforward, since contributions from arbitrarily high- order loops can be of the same size. We show in massless scalar QED that an infinite class of loops can be summed (and must be summed) to give a gauge invariant value for the potential at its minimum. In addition, we show that the exact potential depends on both the scale at which it is calculated and the normalization of the fields, but the vacuum energy does not. Using these insights, we propose a method to extract some physical quantities from effective potentials which is self-consistent order-by-order in perturbation theory, including improvement with the renormalization group. △ Less

Submitted 1 August, 2014; originally announced August 2014.

Comments: 37 pages, 3 figures

Showing 1–29 of 29 results for author: Andreassen, A