Skip to main content

Showing 1–29 of 29 results for author: Andreassen, A

.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2305.01957  [pdf, other

    cs.CL

    NorQuAD: Norwegian Question Answering Dataset

    Authors: Sardana Ivanova, Fredrik Aas Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid

    Abstract: In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human per… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted to NoDaLiDa 2023

  4. arXiv:2207.04901  [pdf, other

    cs.CL cs.LG

    Exploring Length Generalization in Large Language Models

    Authors: Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

    Abstract: The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring th… ▽ More

    Submitted 14 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

  5. arXiv:2206.14858  [pdf, other

    cs.CL cs.AI cs.LG

    Solving Quantitative Reasoning Problems with Language Models

    Authors: Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

    Abstract: Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained o… ▽ More

    Submitted 30 June, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: 12 pages, 5 figures + references and appendices

  6. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  7. arXiv:2112.00114  [pdf, other

    cs.LG cs.NE

    Show Your Work: Scratchpads for Intermediate Computation with Language Models

    Authors: Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

    Abstract: Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

  8. arXiv:2106.15831  [pdf, other

    cs.LG cs.AI cs.CV

    The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

    Authors: Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs

    Abstract: Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models. Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness" and are exceedingly rare. Ident… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: 27 pages, 25 figures

  9. arXiv:2105.04448  [pdf, other

    stat.ML cs.LG hep-ex hep-ph physics.data-an

    Scaffolding Simulations with Deep Learning for High-dimensional Deconvolution

    Authors: Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Adi Suresh, Jesse Thaler

    Abstract: A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OmniFold. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: 6 pages, 1 figure, 1 table

    Journal ref: ICLR simDL workshop 2021 (https://simdl.github.io/files/12.pdf)

  10. arXiv:2101.08320  [pdf, other

    hep-ph hep-ex physics.data-an

    The LHC Olympics 2020: A Community Challenge for Anomaly Detection in High Energy Physics

    Authors: Gregor Kasieczka, Benjamin Nachman, David Shih, Oz Amram, Anders Andreassen, Kees Benkendorfer, Blaz Bortolato, Gustaaf Brooijmans, Florencia Canelli, Jack H. Collins, Biwei Dai, Felipe F. De Freitas, Barry M. Dillon, Ioan-Mihail Dinu, Zhongtian Dong, Julien Donini, Javier Duarte, D. A. Faroughy, Julia Gonski, Philip Harris, Alan Kahn, Jernej F. Kamenik, Charanjit K. Khosa, Patrick Komiske, Luc Le Pottier , et al. (22 additional authors not shown)

    Abstract: A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a… ▽ More

    Submitted 20 January, 2021; originally announced January 2021.

    Comments: 108 pages, 53 figures, 3 tables

  11. arXiv:2010.15775  [pdf, other

    cs.LG cs.CV stat.ML

    Understanding the Failure Modes of Out-of-Distribution Generalization

    Authors: Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur

    Abstract: Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor accuracy during test-time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way {\em even} in easy-to-learn tasks where one would expe… ▽ More

    Submitted 29 April, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  12. arXiv:2010.03569  [pdf, other

    hep-ph hep-ex physics.data-an stat.ML

    Parameter Estimation using Neural Networks in the Presence of Detector Effects

    Authors: Anders Andreassen, Shih-Chieh Hsu, Benjamin Nachman, Natchanon Suaysom, Adi Suresh

    Abstract: Histogram-based template fits are the main technique used for estimating parameters of high energy physics Monte Carlo generators. Parametrized neural network reweighting can be used to extend this fitting procedure to many dimensions and does not require binning. If the fit is to be performed using reconstructed data, then expensive detector simulations must be used for training the neural networ… ▽ More

    Submitted 6 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 16 pages, 13 figures, 4 tables; v3: Updated to journal version

    Journal ref: Phys. Rev. D 103, 036001 (2021)

  13. arXiv:2008.08675  [pdf, other

    cs.LG hep-th stat.ML

    Asymptotics of Wide Convolutional Neural Networks

    Authors: Anders Andreassen, Ethan Dyer

    Abstract: Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptoti… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: 23 pages, 12 figures

  14. arXiv:2005.12055  [pdf, other

    stat.ML cs.LG stat.AP

    Hierarchical Bayesian Regression for Multi-Site Normative Modeling of Neuroimaging Data

    Authors: Seyed Mostafa Kia, Hester Huijsdens, Richard Dinga, Thomas Wolfers, Maarten Mennes, Ole A. Andreassen, Lars T. Westlye, Christian F. Beckmann, Andre F. Marquand

    Abstract: Clinical neuroimaging has recently witnessed explosive growth in data availability which brings studying heterogeneity in clinical cohorts to the spotlight. Normative modeling is an emerging statistical tool for achieving this objective. However, its application remains technically challenging due to difficulties in properly dealing with nuisance variation, for example due to variability in image… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: To be published in MICCAI 2020 proceedings

  15. arXiv:2001.05001  [pdf, other

    hep-ph hep-ex physics.data-an stat.ML

    Simulation Assisted Likelihood-free Anomaly Detection

    Authors: Anders Andreassen, Benjamin Nachman, David Shih

    Abstract: Given the lack of evidence for new particle discoveries at the Large Hadron Collider (LHC), it is critical to broaden the search program. A variety of model-independent searches have been proposed, adding sensitivity to unexpected signals. There are generally two types of such searches: those that rely heavily on simulations and those that are entirely based on (unlabeled) data. This paper introdu… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 19 pages, 9 figures

    Journal ref: Phys. Rev. D 101, 095004 (2020)

  16. arXiv:1911.09107  [pdf, other

    hep-ph hep-ex physics.data-an stat.ML

    OmniFold: A Method to Simultaneously Unfold All Observables

    Authors: Anders Andreassen, Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, Jesse Thaler

    Abstract: Collider data must be corrected for detector effects ("unfolded") to be compared with many theoretical calculations and measurements from other experiments. Unfolding is traditionally done for individual, binned observables without including all information relevant for characterizing the detector response. We introduce OmniFold, an unfolding method that iteratively reweights a simulated dataset,… ▽ More

    Submitted 16 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 8 pages, 3 figures, 1 table, 1 poem; v2: updated to approximate PRL version

    Report number: MIT-CTP 5155

    Journal ref: Phys. Rev. Lett. 124, 182001 (2020)

  17. arXiv:1907.08209  [pdf, other

    hep-ph hep-ex stat.ML

    Neural Networks for Full Phase-space Reweighting and Parameter Tuning

    Authors: Anders Andreassen, Benjamin Nachman

    Abstract: Precise scientific analysis in collider-based particle physics is possible because of complex simulations that connect fundamental theories to observable quantities. The significant computational cost of these programs limits the scope, precision, and accuracy of Standard Model measurements and searches for new phenomena. We therefore introduce Deep neural networks using Classification for Tuning… ▽ More

    Submitted 26 August, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: 7 pages, 3 figures; v2 has updated citations and clarifications; v3 has a new appendix with an alternative fitting method

    Journal ref: Phys. Rev. D 101, 091901 (2020)

  18. Binary JUNIPR: an interpretable probabilistic model for discrimination

    Authors: Anders Andreassen, Ilya Feige, Christopher Frye, Matthew D. Schwartz

    Abstract: JUNIPR is an approach to unsupervised learning in particle physics that scaffolds a probabilistic model for jets around their representation as binary trees. Separate JUNIPR models can be learned for different event or jet types, then compared and explored for physical insight. The relative probabilities can also be used for discrimination. In this paper, we show how the training of the separate m… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 6 pages, 3 figures

    Journal ref: Phys. Rev. Lett. 123, 182001 (2019)

  19. arXiv:1808.10315  [pdf, other

    q-bio.NC

    Deep Learning for Quality Control of Subcortical Brain 3D Shape Models

    Authors: Dmitry Petrov, Boris A. Gutman Egor Kuznetsov, Theo G. M. van Erp, Jessica A. Turner, Lianne Schmaal, Dick Veltman, Lei Wang, Kathryn Alpert, Dmitry Isaev, Artemis Zavaliangos-Petropulu, Christopher R. K. Ching, Vince Calhoun, David Glahn, Theodore D. Satterthwaite, Ole Andreas Andreassen, Stefan Borgwardt, Fleur Howells, Nynke Groenewold, Aristotle Voineskos, Joaquim Radua, Steven G. Potkin, Benedicto Crespo-Facorro, Diana Tordesillas-Gutierrez, Li Shen, Irina Lebedeva , et al. (48 additional authors not shown)

    Abstract: We present several deep learning models for assessing the morphometric fidelity of deep grey matter region models extracted from brain MRI. We test three different convolutional neural net architectures (VGGNet, ResNet and Inception) over 2D maps of geometric features. Further, we present a novel geometry feature augmentation technique based on a parametric spherical map**. Finally, we present a… ▽ More

    Submitted 4 September, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

    Comments: Accepted to Shape in Medical Imaging (ShapeMI) workshop at MICCAI 2018. arXiv admin note: substantial text overlap with arXiv:1707.06353

  20. JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics

    Authors: Anders Andreassen, Ilya Feige, Christopher Frye, Matthew D. Schwartz

    Abstract: In applications of machine learning to particle physics, a persistent challenge is how to go beyond discrimination to learn about the underlying physics. To this end, a powerful tool would be a framework for unsupervised learning, where the machine learns the intricate high-dimensional contours of the data upon which it is trained, without reference to pre-established labels. In order to approach… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: 37 pages, 24 figures

  21. Scale Invariant Instantons and the Complete Lifetime of the Standard Model

    Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

    Abstract: In a classically scale-invariant quantum field theory, tunneling rates are infrared divergent due to the existence of instantons of any size. While one expects such divergences to be resolved by quantum effects, it has been unclear how higher-loop corrections can resolve a problem appearing already at one loop. With a careful power counting, we uncover a series of loop contributions that dominate… ▽ More

    Submitted 2 May, 2018; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: Typos corrected, numbers updated

    Journal ref: Phys. Rev. D 97, 056006 (2018)

  22. Reducing the Top Quark Mass Uncertainty with Jet Grooming

    Authors: Anders Andreassen, Matthew D. Schwartz

    Abstract: The measurement of the top quark mass has large systematic uncertainties coming from the Monte Carlo simulations that are used to match theory and experiment. We explore how much that uncertainty can be reduced by using jet grooming procedures. We estimate the inherent ambiguity in what is meant by Monte Carlo mass to be around 530 MeV without any corrections. This uncertainty can be reduced by 60… ▽ More

    Submitted 19 May, 2017; originally announced May 2017.

    Comments: 21 pages, 7 figures

  23. arXiv:1612.04535  [pdf, other

    stat.ME stat.AP

    Is the familywise error rate in genomics controlled by methods based on the effective number of independent tests?

    Authors: Kari Krizak Halle, Srdjan Djurovic, Ole Andreas Andreassen, Mette Langaas

    Abstract: In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the con… ▽ More

    Submitted 21 December, 2016; v1 submitted 14 December, 2016; originally announced December 2016.

    Comments: 20 pages, 3 figures

  24. Precision decay rate calculations in quantum field theory

    Authors: Anders Andreassen, David Farhi, William Frost, Matthew D. Schwartz

    Abstract: Tunneling in quantum field theory is worth understanding properly, not least because it controls the long term fate of our universe. There are however, a number of features of tunneling rate calculations which lack a desirable transparency, such as the necessity of analytic continuation, the appropriateness of using an effective instead of classical potential, and the sensitivity to short-distance… ▽ More

    Submitted 31 August, 2017; v1 submitted 20 April, 2016; originally announced April 2016.

    Comments: 88 pages, 18 figures

    Journal ref: Phys. Rev. D 95, 085011 (2017)

  25. arXiv:1603.05938  [pdf, other

    stat.ME stat.AP

    Efficient and powerful familywise error control in genome-wide association studies using generalized linear models

    Authors: K. K. Halle, Ø. Bakke, S. Djurovic, A. Bye, E. Ryeng, U. Wisløff, O. A. Andreassen, M. Langaas

    Abstract: In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure.… ▽ More

    Submitted 22 December, 2016; v1 submitted 18 March, 2016; originally announced March 2016.

  26. A direct approach to quantum tunneling

    Authors: Anders Andreassen, David Farhi, William Frost, Matthew D. Schwartz

    Abstract: The decay rates of quasistable states in quantum field theories are usually calculated using instanton methods. Standard derivations of these methods rely in a crucial way upon deformations and analytic continuations of the physical potential, and on the saddle point approximation. While the resulting procedure can be checked against other semi-classical approaches in some one-dimensional cases, i… ▽ More

    Submitted 2 February, 2016; originally announced February 2016.

    Comments: 5 pages, 2 figures

    Journal ref: Phys. Rev. Lett. 117, 231601 (2016)

  27. arXiv:1509.07258  [pdf, other

    q-bio.NC

    Functional effects of schizophrenia-linked genetic variants on intrinsic single-neuron excitability: A modeling study

    Authors: Tuomo Mäki-Marttunen, Geir Halnes, Anna Devor, Aree Witoelar, Francesco Bettella, Srdjan Djurovic, Yunpeng Wang, Gaute T. Einevoll, Ole A. Andreassen, Anders M. Dale

    Abstract: Background: Recent genome-wide association studies (GWAS) have identified a large number of genetic risk factors for schizophrenia (SCZ) featuring ion channels and calcium transporters. For some of these risk factors, independent prior investigations have examined the effects of genetic alterations on the cellular electrical excitability and calcium homeostasis. In the present proof-of-concept stu… ▽ More

    Submitted 24 September, 2015; originally announced September 2015.

  28. Consistent Use of the Standard Model Effective Potential

    Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

    Abstract: The stability of the Standard Model is determined by the true minimum of the effective Higgs potential. We show that the potential at its minimum when computed by the traditional method is strongly dependent on the gauge parameter. It moreover depends on the scale where the potential is calculated. We provide a consistent method for determining absolute stability independent of both gauge and calc… ▽ More

    Submitted 25 August, 2014; v1 submitted 1 August, 2014; originally announced August 2014.

    Comments: 5 pages, 5 figures

    Journal ref: Phys. Rev. Lett. 113, 241801 (2014)

  29. Consistent Use of Effective Potentials

    Authors: Anders Andreassen, William Frost, Matthew D. Schwartz

    Abstract: It is well known that effective potentials can be gauge-dependent while their values at extrema should be gauge-invariant. Unfortunately, establishing this invariance in perturbation theory is not straightforward, since contributions from arbitrarily high- order loops can be of the same size. We show in massless scalar QED that an infinite class of loops can be summed (and must be summed) to give… ▽ More

    Submitted 1 August, 2014; originally announced August 2014.

    Comments: 37 pages, 3 figures