Search | arXiv e-print repository

Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning

Authors: Xidong Wu, Wan-Yi Lin, Devin Willmott, Filipe Condessa, Yufei Huang, Zhenzhen Li, Madan Ravi Ganesh

Abstract: Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data. However, FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness. A recent approach to mitigating the impact of heterogen… ▽ More Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data. However, FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness. A recent approach to mitigating the impact of heterogeneous data distributions is through the use of foundation models, which offer better performance at the cost of larger computational overheads and slower inference speeds. We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while kee** inference costs low. Our results show improvement in the global model performance on a balanced testing set, which contains rarely observed samples, even under extreme non-IID client data distributions. We conduct a thorough evaluation of our framework with different foundation model backbones on CIFAR10, with varying degrees of heterogeneous data distributions ranging from class-specific data partitions across clients to dirichlet data sampling, parameterized by values between 0.01 and 1.0. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 6 Pages + Appendices

arXiv:2210.03651 [pdf, other]

Understanding the Covariance Structure of Convolutional Filters

Authors: Asher Trockman, Devin Willmott, J. Zico Kolter

Abstract: Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study thei… ▽ More Neural network weights are typically initialized at random from univariate distributions, controlling just the variance of individual weights even in highly-structured operations like convolutions. Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions whose learned filters have notable structure; this presents an opportunity to study their empirical covariances. In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure. Motivated by these findings, we then propose a learning-free multivariate initialization scheme for convolutional filters using a simple, closed-form construction of their covariance. Models using our initialization outperform those using traditional univariate initializations, and typically meet or exceed the performance of those initialized from the covariances of learned filters; in some cases, this improvement can be achieved without training the depthwise convolutional filters at all. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2205.06133 [pdf, other]

doi 10.1021/acs.jctc.2c00555

Orbital Mixer: Using Atomic Orbital Features for Basis Dependent Prediction of Molecular Wavefunctions

Authors: Kirill Shmilovich, Devin Willmott, Ivan Batalov, Mordechai Kornbluth, Jonathan Mailoa, J. Zico Kolter

Abstract: Leveraging ab initio data at scale has enabled the development of machine learning models capable of extremely accurate and fast molecular property prediction. A central paradigm of many previous works focuses on generating predictions for only a fixed set of properties. Recent lines of research instead aim to explicitly learn the electronic structure via molecular wavefunctions from which other q… ▽ More Leveraging ab initio data at scale has enabled the development of machine learning models capable of extremely accurate and fast molecular property prediction. A central paradigm of many previous works focuses on generating predictions for only a fixed set of properties. Recent lines of research instead aim to explicitly learn the electronic structure via molecular wavefunctions from which other quantum chemical properties can directly be derived. While previous methods generate predictions as a function of only the atomic configuration, in this work we present an alternate approach that directly purposes basis dependent information to predict molecular electronic structure. The backbone of our model, Orbital Mixer, uses MLP Mixer layers within a simple, intuitive, and scalable architecture and achieves competitive Hamiltonian and molecular orbital energy and coefficient prediction accuracies compared to the state-of-the-art. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2102.00029 [pdf, other]

You Only Query Once: Effective Black Box Adversarial Attacks with Minimal Repeated Queries

Authors: Devin Willmott, Anit Kumar Sahu, Fatemeh Sheikholeslami, Filipe Condessa, Zico Kolter

Abstract: Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thous… ▽ More Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thousands of times or more, making it easy for defenders to detect an ensuing attack. In this work, we instead show that it is possible to craft (universal) adversarial perturbations in the black-box setting by querying a sequence of different images only once. This attack prevents detection from high number of similar queries and produces a perturbation that causes misclassification when applied to any input to the classifier. In experiments, we show that attacks that adhere to this restriction can produce untargeted adversarial perturbations that fool the vast majority of MNIST and CIFAR-10 classifier inputs, as well as in excess of $60-70\%$ of inputs on ImageNet classifiers. In the targeted setting, we exhibit targeted black-box universal attacks on ImageNet classifiers with success rates above $20\%$ when only allowed one query per image, and $66\%$ when allowed two queries per image. △ Less

Submitted 29 January, 2021; originally announced February 2021.

arXiv:2008.05994 [pdf]

doi 10.1371/journal.pone.0253612

A community-powered search of machine learning strategy space to find NMR property prediction models

Authors: Lars A. Bratholm, Will Gerrard, Brandon Anderson, Shaojie Bai, Sunghwan Choi, Lam Dang, Pavel Hanchar, Addison Howard, Guillaume Huard, Sanghoon Kim, Zico Kolter, Risi Kondor, Mordechai Kornbluth, Youhan Lee, Youngsoo Lee, Jonathan P. Mailoa, Thanh Tu Nguyen, Milos Popovic, Goran Rakocevic, Walter Reade, Wonho Song, Luka Stojanovic, Erik H. Thiede, Nebojsa Tijanic, Andres Torrubia , et al. (4 additional authors not shown)

Abstract: The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the… ▽ More The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2007.07210 [pdf, other]

Simple and Efficient Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes

Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for develo** black-box adversarial attacks. Issues with BO's performance in high dimensions are avo… ▽ More We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for develo** black-box adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace. We demonstrate the efficacy of our proposed attack method by evaluating both $\ell_\infty$ and $\ell_2$ norm constrained untargeted and targeted hard label black-box attacks on three standard datasets - MNIST, CIFAR-10 and ImageNet. Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black-box adversarial attacks. △ Less

Submitted 11 June, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted at KDD 2021. arXiv admin note: substantial text overlap with arXiv:1909.13857

arXiv:1909.13857 [pdf, other]

Black-box Adversarial Attacks with Bayesian Optimization

Authors: Satya Narayan Shukla, Anit Kumar Sahu, Devin Willmott, J. Zico Kolter

Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high… ▽ More We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to $80\%$ with respect to the state of the art methods. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1906.10819 [pdf, other]

doi 10.1515/cmb-2020-0002

Improving RNA secondary structure prediction via state inference with deep recurrent neural networks

Authors: Devin Willmott, David Murrugarra, Qiang Ye

Abstract: The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Bidirectional long short-term memory (L… ▽ More The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. These improvements range from several percentage points for some sequences to nearly 50 percentage points for others. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of the sequence to the dataset. This paper presents a deep learning state inference tool, trained and tested on 16S ribosomal RNA. Converting these state predictions into synthetic SHAPE data with which to direct NNTM can result in large improvements in secondary structure prediction accuracy, as shown on a test set of 16S rRNA. △ Less

Submitted 23 February, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

Comments: 15 pages, 3 figures, and 5 tables

MSC Class: 92

Journal ref: Computational and Mathematical Biophysics, 8(1), 36-50, 2020

arXiv:1707.09520 [pdf, other]

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Authors: Kyle Helfrich, Devin Willmott, Qiang Ye

Abstract: Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices wit… ▽ More Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs. △ Less

Submitted 19 June, 2018; v1 submitted 29 July, 2017; originally announced July 2017.

Comments: 12 pages

Showing 1–9 of 9 results for author: Willmott, D