-
Just How Flexible are Neural Networks in Practice?
Authors:
Ravid Shwartz-Ziv,
Micah Goldblum,
Arpit Bansal,
C. Bayan Bruss,
Yann LeCun,
Andrew Gordon Wilson
Abstract:
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c…
▽ More
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) while stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples can be predictive of generalization; (5) ReLU activation functions result in finding minima that fit more data despite being designed to avoid vanishing and exploding gradients in deep architectures.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
Authors:
Alan Nawzad Amin,
Andrew Gordon Wilson
Abstract:
To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely…
▽ More
To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.
△ Less
Submitted 18 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Large Language Models Must Be Taught to Know What They Don't Know
Authors:
Sanyam Kapoor,
Nate Gruver,
Manley Roberts,
Katherine Collins,
Arka Pal,
Umang Bhatt,
Adrian Weller,
Samuel Dooley,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati…
▽ More
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Transferring Knowledge from Large Foundation Models to Small Downstream Models
Authors:
Shikai Qiu,
Boran Han,
Danielle C. Maddix,
Shuai Zhang,
Yuyang Wang,
Andrew Gordon Wilson
Abstract:
How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn co…
▽ More
How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT). Instead of transferring weights, AFT operates purely on features, thereby decoupling the choice of the pre-trained model from the smaller downstream model. Rather than indiscriminately compressing all pre-trained features, AFT adaptively transfers pre-trained features that are most useful for performing the downstream task, using a simple regularization that adds minimal overhead. Across multiple vision, language, and multi-modal datasets, AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost. Furthermore, AFT reliably translates improvement in pre-trained models into improvement in downstream performance, even if the downstream model is over $50\times$ smaller, and can effectively transfer complementary information learned by multiple pre-trained models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
All-sky three-dimensional dust density and extinction Maps of the Milky Way out to 2.8 kpc
Authors:
T. E. Dharmawardena,
C. A. L. Bailer-Jones,
M. Fouesneau,
D. Foreman-Mackey,
P. Coronica,
T. Colnaghi,
T. Müller,
A. G. Wilson
Abstract:
Three-dimensional dust density maps are crucial for understanding the structure of the interstellar medium of the Milky Way and the processes that shape it. However, constructing these maps requires large datasets and the methods used to analyse them are computationally expensive and difficult to scale up. As a result it is has only recently become possible to map kiloparsec-scale regions of our G…
▽ More
Three-dimensional dust density maps are crucial for understanding the structure of the interstellar medium of the Milky Way and the processes that shape it. However, constructing these maps requires large datasets and the methods used to analyse them are computationally expensive and difficult to scale up. As a result it is has only recently become possible to map kiloparsec-scale regions of our Galaxy at parsec-scale grid sampling. We present all-sky three-dimensional dust density and extinction maps of the Milky Way out to 2.8~kpc in distance from the Sun using the fast and scalable Gaussian Process algorithm \DustT. The sampling of the three-dimensional map is $l,b,d = 1^{\circ} \times1^{\circ} \times 1.7$~pc. The input extinction and distance catalogue contains 120 million stars with photometry and astrometry from Gaia DR2, 2MASS and AllWISE. This combines the strengths of optical and infrared data to probe deeper into the dusty regions of the Milky Way. We compare our maps with other published 3D dust maps. All maps quantitatively agree at the $0.001$~mag~pc$^{-1}$ scale with many qualitatively similar features, although each map also has its own features. We recover Galactic features previously identified in the literature. Moreover, we also see a large under-density that may correspond to an inter-arm or -spur gap towards the Galactic Centre.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Authors:
Shikai Qiu,
Andres Potapczynski,
Marc Finzi,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that diffe…
▽ More
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance, especially as models scale. Using insights from the Maximal Update Parameterization, we determine the optimal scaling for initialization and learning rates of these unconventional layers. Finally, we measure the scaling laws of different structures to compare how quickly their performance improves with compute. We propose a novel matrix family containing Monarch matrices, the Block Tensor-Train (BTT), which we show performs better than dense matrices for the same compute on multiple tasks. On CIFAR-10/100 with augmentation, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs. BTT matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute and is more efficient than dense for training small GPT-2 language models.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Lichen-Mediated Self-Growing Construction Materials for Habitat Outfitting on Mars
Authors:
Nisha Rokaya,
Erin C. Carr,
Richard A. Wilson,
Congrui **
Abstract:
As its next step in space exploration, the National Aeronautics and Space Administration (NASA) revealed plans to establish a permanent human presence on Mars. Habitat outfitting, i.e., the technology to provide the crew with the necessary equipment to perform mission tasks as well as a comfortable, safe, and livable habitable volume, has not been fully explored yet. This study proposes that, rath…
▽ More
As its next step in space exploration, the National Aeronautics and Space Administration (NASA) revealed plans to establish a permanent human presence on Mars. Habitat outfitting, i.e., the technology to provide the crew with the necessary equipment to perform mission tasks as well as a comfortable, safe, and livable habitable volume, has not been fully explored yet. This study proposes that, rather than ship** prefabricated outfitting elements to Mars, habitat outfitting can be realized by in-situ construction using cyanobacteria and fungi as building agents. A synthetic lichen system, composed of diazotrophic cyanobacteria and filamentous fungi, can be created to produce abundant biominerals (CaCO3) and biopolymers, which will glue Martian regolith into consolidated building blocks. These self-growing building blocks can be assembled into various structures, such as floors, walls, partitions, and furniture.
△ Less
Submitted 13 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making
Authors:
Shomik Jain,
D Calacci,
Ashia Wilson
Abstract:
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, th…
▽ More
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Explosively driven Richtmyer--Meshkov instability jet suppression and enhancement via coupling machine learning and additive manufacturing
Authors:
Dane M. Sterbentz,
Dylan J. Kline,
Daniel A. White,
Charles F. Jekel,
Michael P. Hennessey,
David K. Amondson,
Abigail J. Wilson,
Max J. Sevcik,
Matthew F. L. Villena,
Steve S. Lin,
Michael D. Grapes,
Kyle T. Sullivan,
Jonathan L. Belof
Abstract:
The ability to control the behavior of fluid instabilities at material interfaces, such as the shock-driven Richtmyer--Meshkov instability, is a grand technological challenge with a broad number of applications ranging from inertial confinement fusion experiments to explosively driven shaped charges. In this work, we use a linear-geometry shaped charge as a means of studying methods for controllin…
▽ More
The ability to control the behavior of fluid instabilities at material interfaces, such as the shock-driven Richtmyer--Meshkov instability, is a grand technological challenge with a broad number of applications ranging from inertial confinement fusion experiments to explosively driven shaped charges. In this work, we use a linear-geometry shaped charge as a means of studying methods for controlling material jetting that results from the Richtmyer--Meshkov instability. A shaped charge produces a high-velocity jet by focusing the energy from the detonation of high explosives. The interaction of the resulting detonation wave with a hollowed cavity lined with a thin metal layer produces the unstable jetting effect. By modifying characteristics of the detonation wave prior to striking the lined cavity, the kinetic energy of the jet can be enhanced or reduced. Modifying the geometry of the liner material can also be used to alter jetting properties. We apply optimization methods to investigate several design parameterizations for both enhancing or suppressing the shaped-charge jet. This is accomplished using 2D and 3D hydrodynamic simulations to investigate the design space that we consider. We also apply new additive manufacturing methods for producing the shaped-charge assemblies, which allow for experimental testing of complicated design geometries obtained through computational optimization. We present a direct comparison of our optimized designs with experimental results carried out at the High Explosives Application Facility at Lawrence Livermore National Laboratory.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Authors:
Samuel Lavoie,
Polina Kirichenko,
Mark Ibrahim,
Mahmoud Assran,
Andrew Gordon Wilson,
Aaron Courville,
Nicolas Ballas
Abstract:
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v…
▽ More
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by map** an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.
△ Less
Submitted 14 May, 2024; v1 submitted 29 April, 2024;
originally announced May 2024.
-
On possible embeddings of the standard model of particle physics and gravity in $E_8$
Authors:
Robert A. Wilson
Abstract:
I investigate the structure of $E_8$ under the action of the subalgebra/subgroup $A_1+G_2+C_3$, as a potential route to unification of the fundamental forces of nature into a single algebraic structure. The particular real form $E_{8(-24)}$ supports a decomposition into compact $G_2$ plus split $A_1+C_3$, which allows a restriction from $G_2$ to $SU(3)$ for the strong force, together with split…
▽ More
I investigate the structure of $E_8$ under the action of the subalgebra/subgroup $A_1+G_2+C_3$, as a potential route to unification of the fundamental forces of nature into a single algebraic structure. The particular real form $E_{8(-24)}$ supports a decomposition into compact $G_2$ plus split $A_1+C_3$, which allows a restriction from $G_2$ to $SU(3)$ for the strong force, together with split $SL_2(\mathbb R)$ to break the symmetry of the weak interaction and give mass to the intermediate vector bosons. The factor $C_3$ contains various copies of the Lorentz group $SL_2(\mathbb C)$ and extends the `spacetime' symmetries to the full group of symplectic symmetries of real $3+3$-dimensional phase space.
Restricting $G_2$ to the Standard Model $SU(3)$ extends $C_3$ to $A_5$, in the real form $SU(3,3)$, acting on a complex phase space that includes both momentum and current. There is then a natural restriction from $SU(3,3)$ to $SO(3,3)$, describing the action of $SL_4(\mathbb R)$ on phase space. The resulting action of $SL_4(\mathbb R)$ on $E_8$ includes tensors that are equivalent to the stress-energy tensor, the Ricci tensor and the Riemann tensor, and therefore permits the formalism of general relativity to be developed inside $E_{8(-24)}$. The model then suggests unexpected and perhaps subtle ways in which general relativity and particle physics may be forced to modify each other, in order to produce a unified theory.
△ Less
Submitted 6 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Leveraging Speech for Gesture Detection in Multimodal Communication
Authors:
Esam Ghaleb,
Ilya Burenko,
Marlou Rasenberg,
Wim Pouw,
Ivan Toni,
Peter Uhrig,
Anna Wilson,
Judith Holler,
Aslı Özyürek,
Raquel Fernández
Abstract:
Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability…
▽ More
Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability, neglecting the integration of speech and vision signals to detect gestures that co-occur with speech. This work addresses this gap by focusing on co-speech gesture detection, emphasising the synchrony between speech and co-speech hand gestures. We address three main challenges: the variability of gesture forms, the temporal misalignment between gesture and speech onsets, and differences in sampling rate between modalities. We investigate extended speech time windows and employ separate backbone models for each modality to address the temporal misalignment and sampling rate differences. We utilize Transformer encoders in cross-modal and early fusion techniques to effectively align and integrate speech and skeletal sequences. The study results show that combining visual and speech information significantly enhances gesture detection performance. Our findings indicate that expanding the speech buffer beyond visual time segments improves performance and that multimodal integration using cross-modal and early fusion techniques outperforms baseline methods using unimodal and late fusion methods. Additionally, we find a correlation between the models' gesture prediction confidence and low-level speech frequency features potentially associated with gestures. Overall, the study provides a better understanding and detection methods for co-speech gestures, facilitating the analysis of multimodal communication.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Simulating Chemistry on Bosonic Quantum Devices
Authors:
Rishab Dutta,
Delmar G. A. Cabral,
Ningyi Lyu,
Nam P. Vu,
Yuchen Wang,
Brandon Allen,
Xiaohan Dan,
Rodrigo G. Cortiñas,
Pouya Khazaei,
Max Schäfer,
Alejandro C. C. d. Albornoz,
Scott E. Smart,
Scott Nie,
Michel H. Devoret,
David A. Mazziotti,
Prineha Narang,
Chen Wang,
James D. Whitfield,
Angela K. Wilson,
Heidi P. Hendrickson,
Daniel A. Lidar,
Francisco Pérez-Bernal,
Lea F. Santos,
Sabre Kais,
Eitan Geva
, et al. (1 additional authors not shown)
Abstract:
Bosonic quantum devices offer a novel approach to realize quantum computations, where the quantum two-level system (qubit) is replaced with the quantum (an)harmonic oscillator (qumode) as the fundamental building block of the quantum simulator. The simulation of chemical structure and dynamics can then be achieved by representing or map** the system Hamiltonians in terms of bosonic operators. In…
▽ More
Bosonic quantum devices offer a novel approach to realize quantum computations, where the quantum two-level system (qubit) is replaced with the quantum (an)harmonic oscillator (qumode) as the fundamental building block of the quantum simulator. The simulation of chemical structure and dynamics can then be achieved by representing or map** the system Hamiltonians in terms of bosonic operators. In this perspective, we review recent progress and future potential of using bosonic quantum devices for addressing a wide range of challenging chemical problems, including the calculation of molecular vibronic spectra, the simulation of gas-phase and solution-phase adiabatic and nonadiabatic chemical dynamics, the efficient solution of molecular graph theory problems, and the calculations of electronic structure.
△ Less
Submitted 12 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
Authors:
Shomik Jain,
Kathleen Creel,
Ashia Wilson
Abstract:
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
△ Less
Submitted 19 June, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
A Clifford algebra model in phase space
Authors:
Robert A. Wilson
Abstract:
I show how the isomorphism between the Lie groups of types $B_2$ and $C_2$ leads to a faithful action of the Clifford algebra $\mathcal C\ell(3,2)$ on the phase space of 2-dimensional dynamics, and hence to a map** from Dirac spinors modulo scalars into this same phase space. Extending to the phase space of 3-dimensional dynamics allows one to embed all the gauge groups of the Standard Model as…
▽ More
I show how the isomorphism between the Lie groups of types $B_2$ and $C_2$ leads to a faithful action of the Clifford algebra $\mathcal C\ell(3,2)$ on the phase space of 2-dimensional dynamics, and hence to a map** from Dirac spinors modulo scalars into this same phase space. Extending to the phase space of 3-dimensional dynamics allows one to embed all the gauge groups of the Standard Model as well, and hence unify the electro-weak and strong forces into a single algebraic structure, identified as the gauge group of Hamiltonian dynamics. The gauge group transforms between phase space coordinates appropriate for arbitrary observers, and therefore shows how the apparently arbitrary parameters of the Standard Model transform between mutually accelerating observers. In particular, it is possible to calculate the transformation between an inertial frame and the laboratory frame, in order to explain how macroscopic laboratory mechanics emerges from quantum mechanics, and to show how to write down a quantum theory of gravity that is consistent with quantum mechanics, but is not consistent with General Relativity.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Chain event graphs for assessing activity-level propositions in forensic science in relation to drug traces on banknotes
Authors:
Gail Robertson,
Amy L Wilson,
Jim Q Smith
Abstract:
Graphical models and likelihood ratios can be used by forensic scientists to compare support given by evidence to propositions put forward by competing parties during court proceedings. Such models can also be used to evaluate support for activity-level propositions, i.e. propositions that refer to the nature of activities associated with evidence and how this evidence came to be at a crime scene.…
▽ More
Graphical models and likelihood ratios can be used by forensic scientists to compare support given by evidence to propositions put forward by competing parties during court proceedings. Such models can also be used to evaluate support for activity-level propositions, i.e. propositions that refer to the nature of activities associated with evidence and how this evidence came to be at a crime scene. Graphical methods can be used to show explicitly different scenarios that might explain the evidence in a case and to distinguish between evidence requiring evaluation by a jury and quantifiable evidence from the crime scene. Such visual representations can be helpful for forensic practitioners, the police and lawyers who may need to assess the value that different pieces of evidence make to their arguments in a case. In this paper we demonstrate for the first time how chain event graphs can be applied to a criminal case involving drug trafficking. We show how different types of evidence (i.e. expert judgement and data collected from a crime scene) can be combined using a chain event graph and show how the hierarchical model deriving from the graph can be used to evaluate the degree of support for different activity-level propositions in the case. We also develop a modification of the standard chain event graph to simplify their use in forensic applications.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Tutte polynomials in superspace
Authors:
Brendon Rhoades,
Vasu Tewari,
Andy Wilson
Abstract:
We associate a quotient of superspace to any hyperplane arrangement by considering the differential closure of an ideal generated by powers of certain homogeneous linear forms. This quotient is a superspace analogue of the external zonotopal algebra, and it further contains the central zonotopal algebra in the appropriate grading. We show that an evaluation of the bivariate Tutte polynomial is the…
▽ More
We associate a quotient of superspace to any hyperplane arrangement by considering the differential closure of an ideal generated by powers of certain homogeneous linear forms. This quotient is a superspace analogue of the external zonotopal algebra, and it further contains the central zonotopal algebra in the appropriate grading. We show that an evaluation of the bivariate Tutte polynomial is the bigraded Hilbert series of this quotient. We then use this fact to construct an explicit basis for the Macaulay inverse. These results generalize those of Ardila-Postnikov and Holtz-Ron. We also discuss enumerative consequences of our results in the setting of hyperplane arrangements.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
A comparison of graphical methods in the case of the murder of Meredith Kercher
Authors:
A. Philip Dawid,
Francesco Dotto,
Maxine Graves,
Jay B. Kadane,
Julia Mortera,
Gail Robertson,
Jim Q. Smith,
Amy L. Wilson
Abstract:
We compare three graphical methods for displaying evidence in a legal case: Wigmore Charts, Bayesian Networks and Chain Event Graphs. We find that these methods are aimed at three distinct audiences, respectively lawyers, forensic scientists and the police. The methods are illustrated using part of the evidence in the case of the murder of Meredith Kercher. More specifically, we focus on represent…
▽ More
We compare three graphical methods for displaying evidence in a legal case: Wigmore Charts, Bayesian Networks and Chain Event Graphs. We find that these methods are aimed at three distinct audiences, respectively lawyers, forensic scientists and the police. The methods are illustrated using part of the evidence in the case of the murder of Meredith Kercher. More specifically, we focus on representing the list of propositions, evidence, testimony and facts given in the first trial against Raffaele Sollecito and Amanda Knox with these graphical methodologies.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
Authors:
Hossein Souri,
Arpit Bansal,
Hamid Kazemi,
Liam Fowl,
Aniruddha Saha,
Jonas Gei**,
Andrew Gordon Wilson,
Rama Chellappa,
Tom Goldstein,
Micah Goldblum
Abstract:
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea…
▽ More
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Quadcopter Team Configurable Motion Guided by a Quadruped
Authors:
Mohammad Ghufran,
Sourish Tetakayala,
Jack Hughes,
Aron Wilson,
Hossein Rastgoftar
Abstract:
The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transfo…
▽ More
The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transformation is decomposed into translation, that is specified by the quadruped global position, and configurable motion of the quadcopters, which is determined by a nonsingular Jacobian matrix so that the quadcopter team can safely navigate a constrained environment while avoiding collision. We propose two methods to experimentally evaluate the proposed heterogeneous robot coordination model. The first method measures real positions of quadcopters, quadruped, and environmental objects all with respect to the global coordinate system. On the other hand, the second method measures position with respect to the local coordinate system fixed on the dog robot which in turn enables safe planning the Jacobian matrix of the quadcopter team while the world is virtually approached the robotic system.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI
Authors:
Shwetha Rajaram,
Nels Numan,
Balasaravanan Thoravi Kumaravel,
Nicolai Marquardt,
Andrew D. Wilson
Abstract:
Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leve…
▽ More
Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an evaluation with 15 end-users, we investigated their customization preferences for work and social scenarios. Participants could rapidly express their design intentions with BlendScape and envisioned using the system to structure collaboration in future meetings, but experienced challenges with preventing distracting elements. We implement scenarios to demonstrate BlendScape's expressiveness in supporting distributed collaboration techniques from prior work and propose composition techniques to improve the quality of environments.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors
Authors:
Tim G. J. Rudner,
Ya Shi Zhang,
Andrew Gordon Wilson,
Julia Kempe
Abstract:
Machine learning models often perform poorly under subpopulation shifts in the data distribution. Develo** methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well…
▽ More
Machine learning models often perform poorly under subpopulation shifts in the data distribution. Develo** methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Chronos: Learning the Language of Time Series
Authors:
Abdul Fatir Ansari,
Lorenzo Stella,
Caner Turkmen,
Xiyuan Zhang,
Pedro Mercado,
Huibin Shen,
Oleksandr Shchur,
Syama Sundar Rangapuram,
Sebastian Pineda Arango,
Shubham Kapoor,
Jasper Zschiegner,
Danielle C. Maddix,
Hao Wang,
Michael W. Mahoney,
Kari Torkkola,
Andrew Gordon Wilson,
Michael Bohlke-Schneider,
Yuyang Wang
Abstract:
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M…
▽ More
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
△ Less
Submitted 2 May, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Controllable Prompt Tuning For Balancing Group Distributional Robustness
Authors:
Hoang Phan,
Andrew Gordon Wilson,
Qi Lei
Abstract:
Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find…
▽ More
Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. However, directly applying such optimization involves updating the parameters of the entire network, making it both computationally expensive and challenging. Thus, we introduce Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data, while requiring only 0.4% tunable parameters.
△ Less
Submitted 4 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Do Large Code Models Understand Programming Concepts? A Black-box Approach
Authors:
Ashish Hooda,
Mihai Christodorescu,
Miltiadis Allamanis,
Aaron Wilson,
Kassem Fawaz,
Somesh Jha
Abstract:
Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We prop…
▽ More
Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.
△ Less
Submitted 23 February, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Individual addressing and state readout of trapped ions utilizing rf micromotion
Authors:
Nathan K Lysne,
Justin F Niedermeyer,
Andrew C Wilson,
Daniel H Slichter,
Dietrich Leibfried
Abstract:
Excess "micromotion" of trapped ions due to the residual radio frequency (rf) trap** field at their location is often undesirable and is usually carefully minimized. Here, we induce precise amounts of excess micromotion on individual ions by adjusting the local static electric field they experience. Micromotion modulates the coupling of an ion to laser fields, ideally tuning it from its maximum…
▽ More
Excess "micromotion" of trapped ions due to the residual radio frequency (rf) trap** field at their location is often undesirable and is usually carefully minimized. Here, we induce precise amounts of excess micromotion on individual ions by adjusting the local static electric field they experience. Micromotion modulates the coupling of an ion to laser fields, ideally tuning it from its maximum value to zero as the ion is moved away from the trap's rf null. We use tunable micromotion to vary the Rabi frequency of stimulated Raman transitions over two orders of magnitude, and to individually control the rates of resonant fluorescence from three ions under global laser illumination without any changes to the driving light fields. The technique is amenable to situations where addressing individual ions with focused laser beams is challenging, such as tightly packed linear ion strings or two-dimensional ion arrays illuminated from the side.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Authors:
Nate Gruver,
Anuroop Sriram,
Andrea Madotto,
Andrew Gordon Wilson,
C. Lawrence Zitnick,
Zachary Ulissi
Abstract:
We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculatio…
▽ More
We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Authors:
Theodore Papamarkou,
Maria Skoularidou,
Konstantina Palla,
Laurence Aitchison,
Julyan Arbel,
David Dunson,
Maurizio Filippone,
Vincent Fortuin,
Philipp Hennig,
José Miguel Hernández-Lobato,
Aliaksandr Hubin,
Alexander Immer,
Theofanis Karaletsos,
Mohammad Emtiyaz Khan,
Agustinus Kristiadi,
Yingzhen Li,
Stephan Mandt,
Christopher Nemeth,
Michael A. Osborne,
Tim G. J. Rudner,
David Rügamer,
Yee Whye Teh,
Max Welling,
Andrew Gordon Wilson,
Ruqi Zhang
Abstract:
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni…
▽ More
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.
△ Less
Submitted 2 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
A discrete model for Gell-Mann matrices
Authors:
Robert A. Wilson
Abstract:
I propose a discrete model for the Gell-Mann matrices, which allows them to participate in discrete symmetries of three generations of four types of elementary fermions, in addition to their usual role in describing a continuous group $SU(3)$ of colour symmetries. This model sheds new light on the mathematical (rather than physical) necessity for `mixing' between the various gauge groups $SU(3)$,…
▽ More
I propose a discrete model for the Gell-Mann matrices, which allows them to participate in discrete symmetries of three generations of four types of elementary fermions, in addition to their usual role in describing a continuous group $SU(3)$ of colour symmetries. This model sheds new light on the mathematical (rather than physical) necessity for `mixing' between the various gauge groups $SU(3)$, $SU(2)$ and $U(1)$ of the Standard Model. In particular it shows how the anti-Hermitian version of Pauli matrices can act non-trivially on a unitary version of the Gell-Mann matrices, which leads to a non-trivial mixing between the weak and strong nuclear forces.
The unitary version of the Gell-Mann matrices can in turn act non-trivially on a quaternionic version of Dirac matrices, which leads to a non-trivial mixing between the strong force and the shape of spacetime defined by the Dirac matrices. Hence this model implies a mixing between the electro-weak-strong forces on the one hand and gravity, as described by General Relativity, on the other. This mixing in turn implies the necessity for both general relativistic corrections to the Standard Model of Particle Physics, and quantum corrections to General Relativity. Contrary to general expectation, both types of corrections seem to be large enough to be tested experimentally.
△ Less
Submitted 19 February, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
A New Division Algebra Representation of $E_7$
Authors:
Tevian Dray,
Corinne A. Manogue,
Robert A. Wilson
Abstract:
We decompose the Lie algebra $\mathfrak{e}_{8(-24)}$ into representations of $\mathfrak{e}_{7(-25)}\oplus\mathfrak{sl}(2,\mathbb{R})$ using our recent description of $\mathfrak{e}_8$ in terms of (generalized) $3\times3$ matrices over pairs of division algebras. Freudenthal's description of both $\mathfrak{e}_7$ and its minimal representation are therefore realized explicitly within…
▽ More
We decompose the Lie algebra $\mathfrak{e}_{8(-24)}$ into representations of $\mathfrak{e}_{7(-25)}\oplus\mathfrak{sl}(2,\mathbb{R})$ using our recent description of $\mathfrak{e}_8$ in terms of (generalized) $3\times3$ matrices over pairs of division algebras. Freudenthal's description of both $\mathfrak{e}_7$ and its minimal representation are therefore realized explicitly within $\mathfrak{e}_8$, with the action given by the (generalized) matrix commutator in $\mathfrak{e}_8$, and with a natural parameterization using division algebras. Along the way, we show how to implement standard operations on the Albert algebra such as trace of the Jordan product, the Freudenthal product, and the determinant, all using commutators in $\mathfrak{e}_8$.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security
Authors:
Alec Wilson,
Ryan Menzies,
Neela Morarji,
David Foster,
Marco Casassa Mont,
Esin Turkbeyler,
Lisa Gralewski
Abstract:
This paper demonstrates the potential for autonomous cyber defence to be applied on industrial control systems and provides a baseline environment to further explore Multi-Agent Reinforcement Learning's (MARL) application to this problem domain. It introduces a simulation environment, IPMSRL, of a generic Integrated Platform Management System (IPMS) and explores the use of MARL for autonomous cybe…
▽ More
This paper demonstrates the potential for autonomous cyber defence to be applied on industrial control systems and provides a baseline environment to further explore Multi-Agent Reinforcement Learning's (MARL) application to this problem domain. It introduces a simulation environment, IPMSRL, of a generic Integrated Platform Management System (IPMS) and explores the use of MARL for autonomous cyber defence decision-making on generic maritime based IPMS Operational Technology (OT). OT cyber defensive actions are less mature than they are for Enterprise IT. This is due to the relatively brittle nature of OT infrastructure originating from the use of legacy systems, design-time engineering assumptions, and lack of full-scale modern security controls. There are many obstacles to be tackled across the cyber landscape due to continually increasing cyber-attack sophistication and the limitations of traditional IT-centric cyber defence solutions. Traditional IT controls are rarely deployed on OT infrastructure, and where they are, some threats aren't fully addressed. In our experiments, a shared critic implementation of Multi Agent Proximal Policy Optimisation (MAPPO) outperformed Independent Proximal Policy Optimisation (IPPO). MAPPO reached an optimal policy (episode outcome mean of 1) after 800K timesteps, whereas IPPO was only able to reach an episode outcome mean of 0.966 after one million timesteps. Hyperparameter tuning greatly improved training performance. Across one million timesteps the tuned hyperparameters reached an optimal policy whereas the default hyperparameters only managed to win sporadically, with most simulations resulting in a draw. We tested a real-world constraint, attack detection alert success, and found that when alert success probability is reduced to 0.75 or 0.9, the MARL defenders were still able to win in over 97.5% or 99.5% of episodes, respectively.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
WebGPU-SPY: Finding Fingerprints in the Sandbox through GPU Cache Attacks
Authors:
Ethan Ferguson,
Adam Wilson,
Hoda Naghibijouybari
Abstract:
Microarchitectural attacks on CPU structures have been studied in native applications, as well as in web browsers. These attacks continue to be a substantial threat to computing systems at all scales.
With the proliferation of heterogeneous systems and integration of hardware accelerators in every computing system, modern web browsers provide the support of GPU-based acceleration for the graphic…
▽ More
Microarchitectural attacks on CPU structures have been studied in native applications, as well as in web browsers. These attacks continue to be a substantial threat to computing systems at all scales.
With the proliferation of heterogeneous systems and integration of hardware accelerators in every computing system, modern web browsers provide the support of GPU-based acceleration for the graphics and rendering processes. Emerging web standards also support the GPU acceleration of general-purpose computation within web browsers.
In this paper, we present a new attack vector for microarchitectural attacks in web browsers. We use emerging GPU accelerating APIs in modern browsers (specifically WebGPU) to launch a GPU-based cache side channel attack on the compute stack of the GPU that spies on victim activities on the graphics (rendering) stack of the GPU. Unlike prior works that rely on JavaScript APIs or software interfaces to build timing primitives, we build the timer using GPU hardware resources and develop a cache side channel attack on Intel's integrated GPUs. We leverage the GPU's inherent parallelism at different levels to develop high-resolution parallel attacks. We demonstrate that GPU-based cache attacks can achieve a precision of 90 for website fingerprinting of 100 top websites. We also discuss potential countermeasures against the proposed attack to secure the systems at a critical time when these web standards are being developed and before they are widely deployed.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Penalized Distributed Lag Interaction Model: Air Pollution, Birth Weight and Neighborhood Vulnerability
Authors:
Danielle Demateis,
Kayleigh P. Keller,
David Rojas-Rueda,
Marianthi-Anna Kioumourtzoglou,
Ander Wilson
Abstract:
Maternal exposure to air pollution during pregnancy has a substantial public health impact. Epidemiological evidence supports an association between maternal exposure to air pollution and low birth weight. A popular method to estimate this association while identifying windows of susceptibility is a distributed lag model (DLM), which regresses an outcome onto exposure history observed at multiple…
▽ More
Maternal exposure to air pollution during pregnancy has a substantial public health impact. Epidemiological evidence supports an association between maternal exposure to air pollution and low birth weight. A popular method to estimate this association while identifying windows of susceptibility is a distributed lag model (DLM), which regresses an outcome onto exposure history observed at multiple time points. However, the standard DLM framework does not allow for modification of the association between repeated measures of exposure and the outcome. We propose a distributed lag interaction model that allows modification of the exposure-time-response associations across individuals by including an interaction between a continuous modifying variable and the exposure history. Our model framework is an extension of a standard DLM that uses a cross-basis, or bi-dimensional function space, to simultaneously describe both the modification of the exposure-response relationship and the temporal structure of the exposure data. Through simulations, we showed that our model with penalization out-performs a standard DLM when the true exposure-time-response associations vary by a continuous variable. Using a Colorado, USA birth cohort, we estimated the association between birth weight and ambient fine particulate matter air pollution modified by an area-level metric of health and social adversities from Colorado EnviroScreen.
△ Less
Submitted 21 February, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Understanding the Detrimental Class-level Effects of Data Augmentation
Authors:
Polina Kirichenko,
Mark Ibrahim,
Randall Balestriero,
Diane Bouchacourt,
Ramakrishna Vedantam,
Hamed Firooz,
Andrew Gordon Wilson
Abstract:
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. The…
▽ More
Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.
△ Less
Submitted 7 December, 2023;
originally announced January 2024.
-
On subgroups of the Monster isomorphic to $PSL_2(8)$
Authors:
Robert A. Wilson
Abstract:
We describe computer calculations that were used in 2016 to classify subgroups of the Monster isomorphic to $PSL_2(8)$, containing $7B$-elements. It turns out that there is no such $PSL_2(8)$ in the Monster. These calculations confirm earlier unpublished calculations by P. E. Holmes that obtained the same result. The result has also been confirmed in independent calculations by H. Dietrich, M. Lee…
▽ More
We describe computer calculations that were used in 2016 to classify subgroups of the Monster isomorphic to $PSL_2(8)$, containing $7B$-elements. It turns out that there is no such $PSL_2(8)$ in the Monster. These calculations confirm earlier unpublished calculations by P. E. Holmes that obtained the same result. The result has also been confirmed in independent calculations by H. Dietrich, M. Lee and T. Popiel, using different software by M. Seysen. Thus this experimental result is shown to be reproducible.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
Authors:
Ying Wang,
Tim G. J. Rudner,
Andrew Gordon Wilson
Abstract:
Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual…
▽ More
Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively.
△ Less
Submitted 22 June, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Non-Vacuous Generalization Bounds for Large Language Models
Authors:
Sanae Lotfi,
Marc Finzi,
Yilun Kuang,
Tim G. J. Rudner,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular,…
▽ More
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss using prediction smoothing, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models.
△ Less
Submitted 12 February, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Function-Space Regularization in Neural Networks: A Probabilistic Perspective
Authors:
Tim G. J. Rudner,
Sanyam Kapoor,
Shikai Qiu,
Andrew Gordon Wilson
Abstract:
Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by vie…
▽ More
Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. This method -- which we refer to as function-space empirical Bayes (FSEB) -- includes both parameter- and function-space regularization, is mathematically simple, easy to implement, and incurs only minimal computational overhead compared to standard regularization techniques. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection, highly-calibrated predictive uncertainty estimates, successful task adaption from pre-trained models, and improved generalization under covariate shift.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Mean-field underdamped Langevin dynamics and its spacetime discretization
Authors:
Qiang Fu,
Ashia Wilson
Abstract:
We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discr…
▽ More
We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics, for which we provide a new, fast mixing guarantee. In addition, we demonstrate that our algorithm converges globally in total variation distance, bridging the theoretical gap between the dynamics and its practical implementation.
△ Less
Submitted 6 February, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Perspectives on the State and Future of Deep Learning - 2023
Authors:
Micah Goldblum,
Anima Anandkumar,
Richard Baraniuk,
Tom Goldstein,
Kyunghyun Cho,
Zachary C Lipton,
Melanie Mitchell,
Preetum Nakkiran,
Max Welling,
Andrew Gordon Wilson
Abstract:
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, kee** an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on inter…
▽ More
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, kee** an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on interpretable AI, the value of benchmarking in modern NLP, the state of progress towards understanding deep learning, and the future of academia.
△ Less
Submitted 18 December, 2023; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm
Authors:
Vishwak Srinivasan,
Andre Wibisono,
Ashia Wilson
Abstract:
We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of…
▽ More
We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.
△ Less
Submitted 21 June, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs
Authors:
Zico da Silva,
Stacy Shield,
Penny E. Hudson,
Alan M. Wilson,
Fred Nicolls,
Amir Patel
Abstract:
The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods…
▽ More
The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods used are remote and do not constrain the animal's motion. In this work, we use data obtained from cheetahs in the wild to present a trajectory optimisation approach for estimating the 3D kinematics and joint torques of subjects remotely. We call this approach kinetic full trajectory estimation (K-FTE). We validate the method on a dataset comprising synchronised video and force plate data. We are able to reconstruct the 3D kinematics with an average reprojection error of 17.69 pixels (62.94 $\%$ PCK using the nose-to-eye(s) length segment as a threshold), while the estimates produce an average root-mean-square error of 171.3 N ($\approx$ 17.16 $\%$ of peak force during stride) for the estimated ground reaction force when compared against the force plate data. While the joint torques cannot be directly validated against ground truth data, as no such data is available for cheetahs, the estimated torques agree with previous studies of quadrupeds in controlled settings. These results will enable deeper insight into the study of animal locomotion in a more natural environment for both biologists and roboticists.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Coloring Groups
Authors:
Ben Adenbaum,
Alexander Wilson
Abstract:
We introduce coloring groups, which are permutation groups obtained from a proper edge coloring of a graph. These groups generalize the generalized toggle groups of Striker (which themselves generalize the toggle groups introduced by Cameron and Fon-der-Flaass). We present some general results connecting the structure of a coloring group to the structure of its graph coloring, providing graph-theo…
▽ More
We introduce coloring groups, which are permutation groups obtained from a proper edge coloring of a graph. These groups generalize the generalized toggle groups of Striker (which themselves generalize the toggle groups introduced by Cameron and Fon-der-Flaass). We present some general results connecting the structure of a coloring group to the structure of its graph coloring, providing graph-theoretic characterizations of the centralizer and primitivity of a coloring group. We apply these results particularly to generalized toggle groups arising from trees as well as coloring groups arising from the independence posets introduced by Thomas and Williams.
△ Less
Submitted 3 July, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Materials Expert-Artificial Intelligence for Materials Discovery
Authors:
Yanjun Liu,
Milena Jovanovic,
Krishnanand Mallayya,
Wesley J. Maddox,
Andrew Gordon Wilson,
Sebastian Klemenz,
Leslie M. Schoop,
Eun-Ah Kim
Abstract:
The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences…
▽ More
The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Simplifying Neural Network Training Under Class Imbalance
Authors:
Ravid Shwartz-Ziv,
Micah Goldblum,
Yucen Lily Li,
C. Bayan Bruss,
Andrew Gordon Wilson
Abstract:
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such…
▽ More
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, optimizer, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods. We also provide key prescriptions and considerations for training under class imbalance, and an understanding of why imbalance methods succeed or fail.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Should We Learn Most Likely Functions or Parameters?
Authors:
Shikai Qiu,
Tim G. J. Rudner,
Sanyam Kapoor,
Andrew Gordon Wilson
Abstract:
Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generall…
▽ More
Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
Authors:
Valeriia Cherepanova,
Roman Levin,
Gowthami Somepalli,
Jonas Gei**,
C. Bayan Bruss,
Andrew Gordon Wilson,
Tom Goldstein,
Micah Goldblum
Abstract:
Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. E…
▽ More
Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Estimating Primary Substation Boundaries and the Value of Map** the Electrical Network Infrastructure of Great Britain
Authors:
Joseph Day,
I. A. Grant Wilson,
Daniel L. Donaldson,
Edward Barbour,
Bruno Cárdenas,
Christopher R. Jones,
Andrew J. Urquhart,
Seamus D. Garvey
Abstract:
Localised data aggregation in many countries including Great Britain (GB) is typically done to a geographical level with polygon boundaries that have a robust and trusted governance system in place. At a minimum this will mean there is confidence in a process to create a set of polygons that have unique identifiers coupled to geographical areas, and the ability to have these updated through a defi…
▽ More
Localised data aggregation in many countries including Great Britain (GB) is typically done to a geographical level with polygon boundaries that have a robust and trusted governance system in place. At a minimum this will mean there is confidence in a process to create a set of polygons that have unique identifiers coupled to geographical areas, and the ability to have these updated through a defined code of practice. Examples found across many countries are in the delivery of post, such as postcodes and zip codes, and of the definition of census areas and municipal boundaries. The confidence in these boundaries allows different data to be aggregated by third parties, which itself provides greater levels of data over comparable geographical areas to enhance wider analysis and decision making. Here we combine publicly available datasets published from the six regional electricity Distribution Network Operators of GB to produce a new geospatial dataset with 4436 unique polygons defining the areas served by electrical primary substations. An example is also presented of the use of these polygons to link postcode level open government datasets on domestic energy consumption (2015-2020) from the Department of Energy Security and Net Zero (DESNZ). This results in another dataset with energy statistics aggregated to the geographical areas served by each primary substation across Great Britain. Therefore, we believe there is a compelling argument for countries to set up processes to create and update polygons that have a meaningful relationship to energy systems. This would allow more accurate energy systems analysis to be performed, ultimately leading to an accelerated or potentially lower cost transition to a net-zero world.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
On the likely magnesium-iron silicate dusty tails of catastrophically evaporating rocky planets
Authors:
Beatriz Campos Estrada,
James E. Owen,
Marija R. Jankovic,
Anna Wilson,
Christiane Helling
Abstract:
Catastrophically evaporating rocky planets provide a unique opportunity to study the composition of small planets. The surface composition of these planets can be constrained via modelling their comet-like tails of dust. In this work, we present a new self-consistent model of the dusty tails: we physically model the trajectory of the dust grains after they have left the gaseous outflow, including…
▽ More
Catastrophically evaporating rocky planets provide a unique opportunity to study the composition of small planets. The surface composition of these planets can be constrained via modelling their comet-like tails of dust. In this work, we present a new self-consistent model of the dusty tails: we physically model the trajectory of the dust grains after they have left the gaseous outflow, including an on-the-fly calculation of the dust cloud's optical depth. We model two catastrophically evaporating planets: KIC 1255b and K2-22b. For both planets, we find the dust is likely composed of magnesium-iron silicates (olivine and pyroxene), consistent with an Earth-like composition. We constrain the initial dust grain sizes to be $\sim$ 1.25-1.75 $μ$m and the average (dusty) planetary mass-loss rate to be $\sim$ 3$M_\oplus \mathrm{Gyr^{-1}}$. Our model shows the origin of the leading tail of dust of K2-22b is likely a combination of the geometry of the outflow and a low radiation pressure force to stellar gravitational force ratio. We find the optical depth of the dust cloud to be a factor of a few in the vicinity of the planet. Our composition constraint supports the recently suggested idea that the dusty outflows of these planets go through a greenhouse effect-nuclear winter cycle, which gives origin to the observed transit depth time variability. Magnesium-iron silicates have the necessary visible-to-infrared opacity ratio to give origin to this cycle in the high mass-loss state.
△ Less
Submitted 9 January, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Authors:
Micah Goldblum,
Hossein Souri,
Renkun Ni,
Manli Shu,
Viraj Prabhu,
Gowthami Somepalli,
Prithvijit Chattopadhyay,
Mark Ibrahim,
Adrien Bardes,
Judy Hoffman,
Rama Chellappa,
Andrew Gordon Wilson,
Tom Goldstein
Abstract:
Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performan…
▽ More
Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones
△ Less
Submitted 19 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.