Skip to main content

Showing 1–50 of 63 results for author: Dyer, E

.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.11742  [pdf, other

    cs.LG stat.ML

    Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

    Authors: Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y **, Wenrui Ma, Vidya Muthukumar, Eva L Dyer

    Abstract: Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class dispariti… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 25 pages, 9 figures

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  5. arXiv:2310.16046  [pdf, other

    cs.LG q-bio.NC

    A Unified, Scalable Framework for Neural Population Decoding

    Authors: Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael J. Mendelson, Blake Richards, Matthew G. Perich, Guillaume Lajoie, Eva L. Dyer

    Abstract: Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and archit… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  6. arXiv:2308.14596  [pdf, other

    cs.CV cs.LG

    LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration

    Authors: Ran Liu, Sahil Khose, **gyun Xiao, Lakshmi Sathidevi, Keerthan Ramnath, Zsolt Kira, Eva L. Dyer

    Abstract: Despite significant advances in deep learning, models often struggle to generalize well to new, unseen domains, especially when training data is limited. To address this challenge, we propose a novel approach for distribution-aware latent augmentation that leverages the relationships across samples to guide the augmentation procedure. Our approach first degrades the samples stochastically in the l… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  7. arXiv:2308.09198  [pdf, other

    cs.LG cs.SI

    Half-Hop: A graph upsampling approach for slowing down message passing

    Authors: Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L. Dyer

    Abstract: Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the origin… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Published as a conference paper at ICML 2023

  8. arXiv:2307.11600  [pdf, other

    physics.atom-ph

    Optical pum** enhancement of a free-induction-decay magnetometer

    Authors: Dominic Hunter, Marcin S. Mrozowski, Allan McWilliam, Stuart J. Ingleby, Terry E. Dyer, Paul F. Griffin, Erling Riis

    Abstract: Spin preparation prior to a free-induction-decay (FID) measurement can be adversely affected by transverse bias fields, particularly in the geophysical field range. A strategy that enhances the spin polarization accumulated before readout is demonstrated, by synchronizing optical pum** with a magnetic field pulse that supersedes any transverse fields by over two order of magnitude. The pulsed ma… ▽ More

    Submitted 29 September, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 10 pages, 7 figures

    Journal ref: Journal of the Optical Society of America B, vol 40, issue 10, pp. 2489-2683 (2023)

  9. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  10. arXiv:2304.04142  [pdf

    q-bio.QM cs.CV eess.IV

    Slideflow: Deep Learning for Digital Histopathology with Real-Time Whole-Slide Visualization

    Authors: James M. Dolezal, Sara Kochanny, Emma Dyer, Andrew Srisuwananukorn, Matteo Sacco, Frederick M. Howard, Anran Li, Prajval Mohan, Alexander T. Pearson

    Abstract: Deep learning methods have emerged as powerful tools for analyzing histopathological images, but current methods are often specialized for specific domains and software environments, and few open-source options exist for deploying models in an interactive interface. Experimenting with different deep learning approaches typically requires switching software libraries and reprocessing data, reducing… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  11. arXiv:2303.08811  [pdf, other

    cs.LG cs.RO

    Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis

    Authors: Mehdi Azabou, Michael Mendelson, Nauman Ahad, Maks Sorokin, Shantanu Thakoor, Carolina Urzay, Eva L. Dyer

    Abstract: Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this wor… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.07041

  12. arXiv:2302.11023  [pdf, other

    cs.LG q-bio.NC

    Learning signatures of decision making from many individuals playing the same game

    Authors: Michael J Mendelson, Mehdi Azabou, Suma Jacob, Nicola Grissom, David Darrow, Becket Ebitz, Alexander Herman, Eva L. Dyer

    Abstract: Human behavior is incredibly complex and the factors that drive decision making--from instinct, to strategy, to biases between individuals--often vary over multiple timescales. In this paper, we design a predictive framework that learns representations to encode an individual's 'behavioral style', i.e. long-term behavioral trends, while simultaneously predicting future actions and choices. The mod… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: 4 pages, 2 figures. To be published in IEEE NER

  13. arXiv:2301.00345  [pdf, other

    cs.CV cs.LG

    MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction

    Authors: Jorge Quesada, Lakshmi Sathidevi, Ran Liu, Nauman Ahad, Joy M. Jackson, Mehdi Azabou, **gyun Xiao, Christopher Liding, Matthew **, Carolina Urzay, William Gray-Roncal, Erik C. Johnson, Eva L. Dyer

    Abstract: There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain map**, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain reg… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: 10 pages, 4 figures, Accepted at NeurIPS 2022

  14. arXiv:2210.05021  [pdf, other

    cs.LG stat.ML

    The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective

    Authors: Chi-Heng Lin, Chiraag Kaushik, Eva L. Dyer, Vidya Muthukumar

    Abstract: Data augmentation (DA) is a powerful workhorse for bolstering performance in modern machine learning. Specific augmentations like translations and scaling in computer vision are traditionally believed to improve generalization by generating new (artificial) data from the same distribution. However, this traditional viewpoint does not explain the success of prevalent augmentations in modern machine… ▽ More

    Submitted 27 February, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: 72 pages, 8 figures

  15. arXiv:2207.04901  [pdf, other

    cs.CL cs.LG

    Exploring Length Generalization in Large Language Models

    Authors: Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

    Abstract: The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring th… ▽ More

    Submitted 14 November, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

  16. arXiv:2206.14858  [pdf, other

    cs.CL cs.AI cs.LG

    Solving Quantitative Reasoning Problems with Language Models

    Authors: Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

    Abstract: Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained o… ▽ More

    Submitted 30 June, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: 12 pages, 5 figures + references and appendices

  17. arXiv:2206.07041  [pdf, other

    cs.LG

    Learning Behavior Representations Through Multi-Timescale Bootstrap**

    Authors: Mehdi Azabou, Michael Mendelson, Maks Sorokin, Shantanu Thakoor, Nauman Ahad, Carolina Urzay, Eva L. Dyer

    Abstract: Natural behavior consists of dynamics that are both unpredictable, can switch suddenly, and unfold over many different timescales. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings due to the fact that they assume a single scale of temporal dynamic… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

  18. arXiv:2206.06131  [pdf, other

    q-bio.NC cs.LG

    Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers

    Authors: Ran Liu, Mehdi Azabou, Max Dabagia, **gyun Xiao, Eva L. Dyer

    Abstract: Complex time-varying systems are often studied by abstracting away from the dynamics of individual components to build a model of the population-level dynamics from the start. However, when building a population-level description, it can be easy to lose sight of each individual and how they contribute to the larger picture. In this paper, we present a novel transformer architecture for learning fr… ▽ More

    Submitted 20 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: accepted by NeurIPS 2022

  19. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  20. arXiv:2205.08413  [pdf, other

    q-bio.NC

    Comparing high-dimensional neural recordings by aligning their low-dimensional latent representations

    Authors: Max Dabagia, Konrad P Kording, Eva L Dyer

    Abstract: Many questions in neuroscience involve understanding of the responses of large populations of neurons. However, when dealing with large-scale neural activity, interpretation becomes difficult, and comparisons between two animals, or across different time points becomes challenging. One major challenge that we face in modern neuroscience is that of correspondence, e.g. we do not record the exact sa… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

  21. arXiv:2203.07852  [pdf, other

    cs.LG cs.AI cs.NE

    Block-Recurrent Transformers

    Authors: DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

    Abstract: We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens during training, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is stri… ▽ More

    Submitted 1 November, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: Update to NeurIPS camera-ready version

  22. arXiv:2202.04000  [pdf, other

    cs.LG

    Learning Sinkhorn divergences for supervised change point detection

    Authors: Nauman Ahad, Eva L. Dyer, Keith B. Hengen, Yao Xie, Mark A. Davenport

    Abstract: Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework… ▽ More

    Submitted 10 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: 19 pages, 13 figures. Reorganized figures and text for improved readability

  23. arXiv:2111.02338  [pdf, other

    cs.LG

    Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

    Authors: Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer

    Abstract: Meaningful and simplified representations of neural activity can yield insights into how and what information is being processed within a neural circuit. However, without labels, finding representations that reveal the link between the brain and behavior can be challenging. Here, we introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: To be published in Neurips 2021

  24. arXiv:2109.04463  [pdf, other

    cs.LG q-bio.NC

    Neural Latents Benchmark '21: Evaluating latent variable models of neural population activity

    Authors: Felix Pei, Joel Ye, David Zoltowski, Anqi Wu, Raeed H. Chowdhury, Hansem Sohn, Joseph E. O'Doherty, Krishna V. Shenoy, Matthew T. Kaufman, Mark Churchland, Mehrdad Jazayeri, Lee E. Miller, Jonathan Pillow, Il Memming Park, Eva L. Dyer, Chethan Pandarinath

    Abstract: Advances in neural recording present increasing opportunities to study neural activity in unprecedented detail. Latent variable models (LVMs) are promising tools for analyzing this rich activity across diverse neural systems and behaviors, as LVMs do not depend on known relationships between the activity and external experimental variables. However, progress with LVMs for neuronal population activ… ▽ More

    Submitted 17 January, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

  25. arXiv:2102.10106  [pdf, other

    cs.LG

    Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

    Authors: Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Bernardo Avila-Pires, Lindsey Kitchell, Keith B. Hengen, William Gray-Roncal, Michal Valko, Eva L. Dyer

    Abstract: State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample. Without sufficient diversity in the transformations used to create views, however, it can be difficult to overcome nuisance variables in the data and build rich representations. This motivates the use of the dataset itself to find similar… ▽ More

    Submitted 13 December, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

  26. arXiv:2102.06701  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Explaining Neural Scaling Laws

    Authors: Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

    Abstract: The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scali… ▽ More

    Submitted 28 April, 2024; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: 11 pages, 3 figures + Supplement (expanded). This version to appear in PNAS

    Journal ref: PNAS 121 (27) e2311878121 (2024)

  27. arXiv:2102.06514  [pdf, other

    cs.LG cs.SI stat.ML

    Large-Scale Representation Learning on Graphs via Bootstrap**

    Authors: Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

    Abstract: Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootst… ▽ More

    Submitted 20 February, 2023; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Published as a conference paper at ICLR 2022

  28. arXiv:2012.11589  [pdf, other

    cs.LG

    Making transport more robust and interpretable by moving data through a small number of anchor points

    Authors: Chi-Heng Lin, Mehdi Azabou, Eva L. Dyer

    Abstract: Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can be fragile to outliers or noise, especially in high dimensions. Here, we introduce a new form of structured OT that simultaneously learns low-dimensional struct… ▽ More

    Submitted 17 July, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Journal ref: International Conference on Machine Learning (ICML) 2021

  29. arXiv:2012.03107  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    When Do Curricula Work?

    Authors: Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

    Abstract: Inspired by human learning, researchers have proposed ordering examples during training based on their difficulty. Both curriculum learning, exposing a network to easier examples early in training, and anti-curriculum learning, showing the most difficult examples first, have been suggested as improvements to the standard i.i.d. training. In this work, we set out to investigate the relative benefit… ▽ More

    Submitted 9 February, 2021; v1 submitted 5 December, 2020; originally announced December 2020.

    Comments: ICLR 2021

  30. arXiv:2008.08675  [pdf, other

    cs.LG hep-th stat.ML

    Asymptotics of Wide Convolutional Neural Networks

    Authors: Anders Andreassen, Ethan Dyer

    Abstract: Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptoti… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: 23 pages, 12 figures

  31. arXiv:2008.07545  [pdf, other

    cs.LG stat.ML

    Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization

    Authors: Neha S. Wadia, Daniel Duckworth, Samuel S. Schoenholz, Ethan Dyer, Jascha Sohl-Dickstein

    Abstract: Machine learning is predicated on the concept of generalization: a model achieving low error on a sufficiently large training set should also perform well on novel samples from the same distribution. We show that both data whitening and second order optimization can harm or entirely prevent generalization. In general, model training harnesses information contained in the sample-sample second momen… ▽ More

    Submitted 19 July, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 13+10 pages, 10 figures; minor textual changes and some reorganization, one new figure and a new proof of main theorem added

  32. arXiv:2007.07400  [pdf, other

    cs.LG cs.CV stat.ML

    Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

    Authors: Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu

    Abstract: A central challenge in develo** versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there is limited understanding of the underlying process and its causes. In this paper, we address this important knowledge gap, investigating how forgetting… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  33. arXiv:2006.02624  [pdf, other

    cs.LG stat.ML

    Bayesian optimization for modular black-box systems with switching costs

    Authors: Chi-Heng Lin, Joseph D. Miano, Eva L. Dyer

    Abstract: Most existing black-box optimization methods assume that all variables in the system being optimized have equal cost and can change freely at each iteration. However, in many real world systems, inputs are passed through a sequence of different operations or modules, making variables in earlier stages of processing more costly to update. Such structure imposes a cost on switching variables in earl… ▽ More

    Submitted 11 October, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

  34. arXiv:2003.03267  [pdf, other

    physics.atom-ph physics.app-ph physics.ins-det

    Resonant Very Low- and Ultra Low Frequency Digital Signal Reception Using a Portable Atomic Magnetometer

    Authors: Stuart J. Ingleby, Iain C. Chalmers, Terry E. Dyer, Paul F. Griffin, Erling Riis

    Abstract: Radio communication through attenuating media necessitates the use of very-low frequency (VLF) and ultra-low frequency (ULF) carrier bands, which are frequently used in underwater and under-ground communication applications. Quantum sensing techniques can be used to circumvent hard constraints on the size, weight and noise floor of classical signal transducers. In this low-frequency range, an opti… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: 8 pages, 9 figures

  35. arXiv:2003.02218  [pdf, other

    stat.ML cs.LG

    The large learning rate phase of deep learning: the catapult mechanism

    Authors: Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

    Abstract: The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small l… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: 25 pages, 19 figures

  36. arXiv:2002.08973  [pdf, other

    cs.LG cs.CV stat.ML

    Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

    Authors: Raphael Gontijo-Lopes, Sylvia J. Smullin, Ekin D. Cubuk, Ethan Dyer

    Abstract: Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalizati… ▽ More

    Submitted 4 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 10 pages, 7 figures

  37. arXiv:1909.11304  [pdf, other

    cs.LG hep-th stat.ML

    Asymptotics of Wide Networks from Feynman Diagrams

    Authors: Ethan Dyer, Guy Gur-Ari

    Abstract: Understanding the asymptotic behavior of wide networks is of considerable interest. In this work, we present a general method for analyzing this large width behavior. The method is an adaptation of Feynman diagrams, a standard tool for computing multivariate Gaussian integrals. We apply our method to study training dynamics, improving existing bounds and deriving new results on wide network evolut… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: 10 pages, 3 figures, 1 Table + Appendices

  38. arXiv:1906.11768  [pdf, other

    stat.ML cs.LG

    Hierarchical Optimal Transport for Multimodal Distribution Alignment

    Authors: John Lee, Max Dabagia, Eva L. Dyer, Christopher J. Rozell

    Abstract: In many machine learning applications, it is necessary to meaningfully aggregate, through alignment, different but related datasets. Optimal transport (OT)-based approaches pose alignment as a divergence minimization problem: the aim is to transform a source dataset to match a target dataset using the Wasserstein distance as a divergence measure. We introduce a hierarchical formulation of OT which… ▽ More

    Submitted 3 November, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

  39. The Most Irrational Rational Theories

    Authors: Nathan Benjamin, Ethan Dyer, A. Liam Fitzpatrick, Yuan Xin

    Abstract: We propose a two-parameter family of modular invariant partition functions of two-dimensional conformal field theories (CFTs) holographically dual to pure three-dimensional gravity in anti de Sitter space. Our two parameters control the central charge, and the representation of $SL(2,\mathbb{Z})$. At large central charge, the partition function has a gap to the first nontrivial primary state of… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

    Comments: 25 pages plus appendices, 11 figures

  40. arXiv:1812.04754  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient Descent Happens in a Tiny Subspace

    Authors: Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer

    Abstract: We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: 9 pages + appendices, 12 figures

  41. Constraints on Flavored 2d CFT Partition Functions

    Authors: Ethan Dyer, A. Liam Fitzpatrick, Yuan Xin

    Abstract: We study the implications of modular invariance on 2d CFT partition functions with abelian or non-abelian currents when chemical potentials for the charges are turned on, i.e. when the partition functions are "flavored". We begin with a new proof of the transformation law for the modular transformation of such partition functions. Then we proceed to apply modular bootstrap techniques to constrain… ▽ More

    Submitted 4 May, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

    Comments: 45 pages, 16 Figures v3: typos corrected, expanded appendix on numeric implementation

  42. arXiv:1707.02467  [pdf, ps, other

    cs.DM

    Random Walks on Small World Networks

    Authors: Martin E. Dyer, Andreas Galanis, Leslie Ann Goldberg, Mark Jerrum, Eric Vigoda

    Abstract: We study the mixing time of random walks on small-world networks modelled as follows: starting with the 2-dimensional periodic grid, each pair of vertices $\{u,v\}$ with distance $d>1$ is added as a "long-range" edge with probability proportional to $d^{-r}$, where $r\geq 0$ is a parameter of the model. Kleinberg studied a close variant of this network model and proved that the (decentralised) rou… ▽ More

    Submitted 26 February, 2020; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: To appear in Transactions of Algorithms (TALG)

  43. Spinning Geodesic Witten Diagrams

    Authors: Ethan Dyer, Daniel Z. Freedman, James Sully

    Abstract: We present an expression for the four-point conformal blocks of symmetric traceless operators of arbitrary spin as an integral over a pair of geodesics in Anti-de Sitter space, generalizing the geodesic Witten diagram formalism of Hijano et al [arXiv:1508.00501] to arbitrary spin. As an intermediate step in the derivation, we identify a convenient basis of bulk three-point interaction vertices whi… ▽ More

    Submitted 20 February, 2017; originally announced February 2017.

    Comments: 28+6 pages, 8 figures

  44. 2D CFT Partition Functions at Late Times

    Authors: Ethan Dyer, Guy Gur-Ari

    Abstract: We consider the late time behavior of the analytically continued partition function $Z(β+ it) Z(β- it)$ in holographic $2d$ CFTs. This is a probe of information loss in such theories and in their holographic duals. We show that each Virasoro character decays in time, and so information is not restored at the level of individual characters. We identify a universal decaying contribution at late time… ▽ More

    Submitted 14 November, 2016; originally announced November 2016.

    Comments: 36 pages, 7 figures

  45. arXiv:1604.03629  [pdf, other

    q-bio.QM cs.CV

    Quantifying mesoscale neuroanatomy using X-ray microtomography

    Authors: Eva L. Dyer, William Gray Roncal, Hugo L. Fernandes, Doga Gürsoy, Vincent De Andrade, Rafael Vescovi, Kamel Fezzaa, Xianghui Xiao, Joshua T. Vogelstein, Chris Jacobsen, Konrad P. Körding, Narayanan Kasthuri

    Abstract: Methods for resolving the 3D microstructure of the brain typically start by thinly slicing and staining the brain, and then imaging each individual section with visible light photons or electrons. In contrast, X-rays can be used to image thick samples, providing a rapid approach for producing large 3D brain maps without sectioning. Here we demonstrate the use of synchrotron X-ray microtomography (… ▽ More

    Submitted 26 July, 2016; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: 28 pages, 9 figures

  46. arXiv:1604.03199  [pdf, other

    q-bio.QM eess.SY q-bio.NC

    From sample to knowledge: Towards an integrated approach for neuroscience discovery

    Authors: William Gray Roncal, Eva L Dyer, Doga Gürsoy, Konrad Kording, Narayanan Kasthuri

    Abstract: Imaging methods used in modern neuroscience experiments are quickly producing large amounts of data capable of providing increasing amounts of knowledge about neuroanatomy and function. A great deal of information in these datasets is relatively unexplored and untapped. One of the bottlenecks in knowledge extraction is that often there is no feedback loop between the knowledge produced (e.g., grap… ▽ More

    Submitted 23 January, 2017; v1 submitted 11 April, 2016; originally announced April 2016.

    Comments: first two authors contributed equally. 8 pages, 2 figures. v2: added acknowledgments

  47. Universal Bounds on Charged States in 2d CFT and 3d Gravity

    Authors: Nathan Benjamin, Ethan Dyer, A. Liam Fitzpatrick, Shamit Kachru

    Abstract: We derive an explicit bound on the dimension of the lightest charged state in two dimensional conformal field theories with a global abelian symmetry. We find that the bound scales with $c$ and provide examples that parametrically saturate this bound. We also prove than any such theory must contain a state with charge-to-mass ratio above a minimal lower bound. We comment on the implications for ch… ▽ More

    Submitted 18 July, 2016; v1 submitted 31 March, 2016; originally announced March 2016.

    Comments: 33 pages, 1 figure; v2: additional refs and comments added

  48. Small Black Holes and Near-Extremal CFTs

    Authors: Nathan Benjamin, Ethan Dyer, A. Liam Fitzpatrick, Alexander Maloney, Eric Perlmutter

    Abstract: Pure theories of AdS$_3$ quantum gravity are conjectured to be dual to CFTs with sparse spectra of light primary operators. The sparsest possible spectrum consistent with modular invariance includes only black hole states above the vacuum. Witten conjectured the existence of a family of extremal CFTs, which realize this spectrum for all admissible values of the central charge. We consider the quan… ▽ More

    Submitted 28 March, 2016; originally announced March 2016.

    Comments: 41 pages + appendices, 6 figures

  49. arXiv:1602.02191  [pdf, other

    stat.ML cs.LG

    Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes

    Authors: Mohammad Gheshlaghi Azar, Eva Dyer, Konrad Kording

    Abstract: Finding efficient and provable methods to solve non-convex optimization problems is an outstanding challenge in machine learning and optimization theory. A popular approach used to tackle non-convex problems is to use convex relaxation techniques to find a convex surrogate for the problem. Unfortunately, convex relaxations typically must be found on a problem-by-problem basis. Thus, providing a ge… ▽ More

    Submitted 3 March, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

    Journal ref: Proc. of the Conference on Uncertainty in Artificial Intelligence, pg. 22-31, 2016

  50. An Extremal N=2 Superconformal Field Theory

    Authors: Nathan Benjamin, Ethan Dyer, A. Liam Fitzpatrick, Shamit Kachru

    Abstract: We provide an example of an extremal chiral ${\cal N}=2$ superconformal field theory at $c=24$. The construction is based on a ${\mathbb Z}_2$ orbifold of the theory associated to the $A_{1}^{24}$ Niemeier lattice. The statespace is governed by representations of the sporadic group $M_{23}$.

    Submitted 30 June, 2015; originally announced July 2015.

    Comments: 20 pages