-
DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
Authors:
Yinjun Wu,
Mayank Keoliya,
Kan Chen,
Neelay Velingker,
Ziyang Li,
Emily J Getzen,
Qi Long,
Mayur Naik,
Ravi B Parikh,
Eric Wong
Abstract:
Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers…
▽ More
Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Authors:
Alexander Robey,
Eric Wong,
Hamed Hassani,
George J. Pappas
Abstract:
Despite efforts to align large language models (LLMs) with human intentions, widely-used LLMs such as GPT, Llama, and Claude are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks. Based on our finding that adversarial…
▽ More
Despite efforts to align large language models (LLMs) with human intentions, widely-used LLMs such as GPT, Llama, and Claude are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. Across a range of popular LLMs, SmoothLLM sets the state-of-the-art for robustness against the GCG, PAIR, RandomSearch, and AmpleGCG jailbreaks. SmoothLLM is also resistant against adaptive GCG attacks, exhibits a small, though non-negligible trade-off between robustness and nominal performance, and is compatible with any LLM. Our code is publicly available at \url{https://github.com/arobey1/smooth-llm}.
△ Less
Submitted 11 June, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Improving Models for Student Retention and Graduation using Markov Chains
Authors:
Mason N Tedeschi,
Tiana M Hose,
Emily K Mehlman,
Scott Franklin,
Tony E Wong
Abstract:
Graduation rates are a key measure of the long-term efficacy of academic interventions. However, challenges to using traditional estimates of graduation rates for underrepresented students include inherently small sample sizes and high data requirements. Here, we show that a Markov model increases confidence and reduces biases in estimated graduation rates for underrepresented minority and first-g…
▽ More
Graduation rates are a key measure of the long-term efficacy of academic interventions. However, challenges to using traditional estimates of graduation rates for underrepresented students include inherently small sample sizes and high data requirements. Here, we show that a Markov model increases confidence and reduces biases in estimated graduation rates for underrepresented minority and first-generation students. We use a Learning Assistant program to demonstrate the Markov model's strength for assessing program efficacy. We find that Learning Assistants in gateway science courses are associated with a 9% increase in the six-year graduation rate. These gains are larger for underrepresented minority (21%) and first-generation students (18%). Our results indicate that Learning Assistants can improve overall graduation rates and address inequalities in graduation rates for underrepresented students.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Sea Level and Socioeconomic Uncertainty Drives High-End Coastal Adaptation Costs
Authors:
Tony E. Wong,
Catherine Ledna,
Lisa Rennels,
Hannah Sheets,
Frank C. Errickson,
Delavane Diaz,
David Anthoff
Abstract:
Sea-level rise and associated flood hazards pose severe risks to the millions of people globally living in coastal zones. Models representing coastal adaptation and impacts are important tools to inform the design of strategies to manage these risks. Representing the often deep uncertainties influencing these risks poses nontrivial challenges. A common uncertainty characterization approach is to u…
▽ More
Sea-level rise and associated flood hazards pose severe risks to the millions of people globally living in coastal zones. Models representing coastal adaptation and impacts are important tools to inform the design of strategies to manage these risks. Representing the often deep uncertainties influencing these risks poses nontrivial challenges. A common uncertainty characterization approach is to use a few benchmark cases to represent the range and relative probabilities of the set of possible outcomes. This has been done in coastal adaptation studies, for example, by using low, moderate, and high percentiles of an input of interest, like sea-level changes. A key consideration is how this simplified characterization of uncertainty influences the distributions of estimated coastal impacts. Here, we show that using only a few benchmark percentiles to represent uncertainty in future sea-level change can lead to overconfident projections and underestimate high-end risks as compared to using full ensembles for sea-level change and socioeconomic parametric uncertainties. When uncertainty in future sea level is characterized by low, moderate, and high percentiles of global mean sea-level rise, estimates of high-end (95th percentile) damages are underestimated by between 18% (SSP1-2.6) and 46% (SSP5-8.5). Additionally, using the 5th and 95th percentiles of sea-level scenarios underestimates the 5-95% width of the distribution of adaptation costs by a factor ranging from about two to four, depending on SSP-RCP pathway. The resulting underestimation of the uncertainty range in adaptation costs can bias adaptation and mitigation decision-making.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Computed Decision Weights and a New Learning Algorithm for Neural Classifiers
Authors:
Eugene Wong
Abstract:
In this paper we consider the possibility of computing rather than training the decision layer weights of a neural classifier. Such a possibility arises in two way, from making an appropriate choice of loss function and by solving a problem of constrained optimization. The latter formulation leads to a promising new learning process for pre-decision weights with both simplicity and efficacy.
In this paper we consider the possibility of computing rather than training the decision layer weights of a neural classifier. Such a possibility arises in two way, from making an appropriate choice of loss function and by solving a problem of constrained optimization. The latter formulation leads to a promising new learning process for pre-decision weights with both simplicity and efficacy.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Leveraging Sparse Linear Layers for Debuggable Deep Networks
Authors:
Eric Wong,
Shibani Santurkar,
Aleksander Mądry
Abstract:
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, expla…
▽ More
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks. The code for our toolkit can be found at https://github.com/madrylab/debuggabledeepnetworks.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Learning perturbation sets for robust machine learning
Authors:
Eric Wong,
J. Zico Kolter
Abstract:
Although much progress has been made towards robust deep learning, a significant gap in robustness remains between real-world perturbations and more narrowly defined sets typically studied in adversarial defenses. In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation. Specifically, we use a c…
▽ More
Although much progress has been made towards robust deep learning, a significant gap in robustness remains between real-world perturbations and more narrowly defined sets typically studied in adversarial defenses. In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation. Specifically, we use a conditional generator that defines the perturbation set over a constrained region of the latent space. We formulate desirable properties that measure the quality of a learned perturbation set, and theoretically prove that a conditional variational autoencoder naturally satisfies these criteria. Using this framework, our approach can generate a variety of perturbations at different complexities and scales, ranging from baseline spatial transformations, through common image corruptions, to lighting variations. We measure the quality of our learned perturbation sets both quantitatively and qualitatively, finding that our models are capable of producing a diverse set of meaningful perturbations beyond the limited data seen during training. Finally, we leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations, while improving generalization on non-adversarial data. All code and configuration files for reproducing the experiments as well as pretrained model weights can be found at https://github.com/locuslab/perturbation_learning.
△ Less
Submitted 8 October, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Neural Network Virtual Sensors for Fuel Injection Quantities with Provable Performance Specifications
Authors:
Eric Wong,
Tim Schneider,
Joerg Schmitt,
Frank R. Schmidt,
J. Zico Kolter
Abstract:
Recent work has shown that it is possible to learn neural networks with provable guarantees on the output of the model when subject to input perturbations, however these works have focused primarily on defending against adversarial examples for image classifiers. In this paper, we study how these provable guarantees can be naturally applied to other real world settings, namely getting performance…
▽ More
Recent work has shown that it is possible to learn neural networks with provable guarantees on the output of the model when subject to input perturbations, however these works have focused primarily on defending against adversarial examples for image classifiers. In this paper, we study how these provable guarantees can be naturally applied to other real world settings, namely getting performance specifications for robust virtual sensors measuring fuel injection quantities within an engine. We first demonstrate that, in this setting, even simple neural network models are highly susceptible to reasonable levels of adversarial sensor noise, which are capable of increasing the mean relative error of a standard neural network from 6.6% to 43.8%. We then leverage methods for learning provably robust networks and verifying robustness properties, resulting in a robust model which we can provably guarantee has at most 16.5% mean relative error under any sensor noise. Additionally, we show how specific intervals of fuel injection quantities can be targeted to maximize robustness for certain ranges, allowing us to train a virtual sensor for fuel injection which is provably guaranteed to have at most 10.69% relative error under noise while maintaining 3% relative error on non-adversarial data within normalized fuel injection ranges of 0.6 to 1.0.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Overfitting in adversarially robust deep learning
Authors:
Leslie Rice,
Eric Wong,
J. Zico Kolter
Abstract:
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are…
▽ More
It is common practice in deep learning to use overparameterized networks and train for as long as possible; there are numerous studies that show, both theoretically and empirically, that such practices surprisingly do not unduly harm the generalization performance of the classifier. In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. We find that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training across multiple datasets (SVHN, CIFAR-10, CIFAR-100, and ImageNet) and perturbation models ($\ell_\infty$ and $\ell_2$). Based upon this observed effect, we show that the performance gains of virtually all recent algorithmic improvements upon adversarial training can be matched by simply using early stop**. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting. Finally, we study several classical and modern deep learning remedies for overfitting, including regularization and data augmentation, and find that no approach in isolation improves significantly upon the gains achieved by early stop**. All code for reproducing the experiments as well as pretrained model weights and training logs can be found at https://github.com/locuslab/robust_overfitting.
△ Less
Submitted 4 March, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Fast is better than free: Revisiting adversarial training
Authors:
Eric Wong,
Leslie Rice,
J. Zico Kolter
Abstract:
Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary,…
▽ More
Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $ε=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $ε=2/255$ in 12 hours, in comparison to past work based on "free" adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.
△ Less
Submitted 12 January, 2020;
originally announced January 2020.
-
A tighter constraint on Earth-system sensitivity from long-term temperature and carbon-cycle observations
Authors:
Tony E. Wong,
Ying Cui,
Dana L. Royer,
Klaus Keller
Abstract:
The long-term temperature response to a given change in CO2 forcing, or Earth-system sensitivity (ESS), is a key parameter quantifying our understanding about the relationship between changes in Earth's radiative forcing and the resulting long-term Earth-system response. Current ESS estimates are subject to sizable uncertainties. Long-term carbon cycle models can provide a useful avenue to constra…
▽ More
The long-term temperature response to a given change in CO2 forcing, or Earth-system sensitivity (ESS), is a key parameter quantifying our understanding about the relationship between changes in Earth's radiative forcing and the resulting long-term Earth-system response. Current ESS estimates are subject to sizable uncertainties. Long-term carbon cycle models can provide a useful avenue to constrain ESS, but previous efforts either use rather informal statistical approaches or focus on discrete paleoevents. Here, we improve on previous ESS estimates by using a Bayesian approach to fuse deep-time CO2 and temperature data over the last 420 Myrs with a long-term carbon cycle model. Our median ESS estimate of 3.4 deg C (2.6-4.7 deg C; 5-95% range) shows a narrower range than previous assessments. We show that weaker chemical weathering relative to the a priori model configuration via reduced weatherable land area yields better agreement with temperature records during the Cretaceous. Research into improving the understanding about these weathering mechanisms hence provides potentially powerful avenues to further constrain this fundamental Earth-system property.
△ Less
Submitted 1 March, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Class Mean Vectors, Self Monitoring and Self Learning for Neural Classifiers
Authors:
Eugene Wong
Abstract:
In this paper we explore the role of sample mean in building a neural network for classification. This role is surprisingly extensive and includes: direct computation of weights without training, performance monitoring for samples without known classification, and self-training for unlabeled data. Experimental computation on a CIFAR-10 data set provides promising empirical evidence on the efficacy…
▽ More
In this paper we explore the role of sample mean in building a neural network for classification. This role is surprisingly extensive and includes: direct computation of weights without training, performance monitoring for samples without known classification, and self-training for unlabeled data. Experimental computation on a CIFAR-10 data set provides promising empirical evidence on the efficacy of a simple and widely applicable approach to some difficult problems.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Adversarial Robustness Against the Union of Multiple Perturbation Models
Authors:
Pratyush Maini,
Eric Wong,
J. Zico Kolter
Abstract:
Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in develo** (both empirically and certifiably) robust classifiers. While most work has defended against a single type of attack, recent work has looked at defending against multiple perturbation models using simple aggregations of multiple attacks. However, these methods can be diffic…
▽ More
Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in develo** (both empirically and certifiably) robust classifiers. While most work has defended against a single type of attack, recent work has looked at defending against multiple perturbation models using simple aggregations of multiple attacks. However, these methods can be difficult to tune, and can easily result in imbalanced degrees of robustness to individual perturbation models, resulting in a sub-optimal worst-case loss over the union. In this work, we develop a natural generalization of the standard PGD-based procedure to incorporate multiple perturbation models into a single attack, by taking the worst-case over all steepest descent directions. This approach has the advantage of directly converging upon a trade-off between different perturbation models which minimizes the worst-case performance over the union. With this approach, we are able to train standard architectures which are simultaneously robust against $\ell_\infty$, $\ell_2$, and $\ell_1$ attacks, outperforming past approaches on the MNIST and CIFAR10 datasets and achieving adversarial accuracy of 47.0% against the union of ($\ell_\infty$, $\ell_2$, $\ell_1$) perturbations with radius = (0.03, 0.5, 12) on the latter, improving upon previous approaches which achieve 40.6% accuracy.
△ Less
Submitted 28 July, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Categorical Co-Frequency Analysis: Clustering Diagnosis Codes to Predict Hospital Readmissions
Authors:
Hallee E. Wong,
Brianna C. Heggeseth,
Steven J. Miller
Abstract:
Accurately predicting patients' risk of 30-day hospital readmission would enable hospitals to efficiently allocate resource-intensive interventions. We develop a new method, Categorical Co-Frequency Analysis (CoFA), for clustering diagnosis codes from the International Classification of Diseases (ICD) according to the similarity in relationships between covariates and readmission risk. CoFA measur…
▽ More
Accurately predicting patients' risk of 30-day hospital readmission would enable hospitals to efficiently allocate resource-intensive interventions. We develop a new method, Categorical Co-Frequency Analysis (CoFA), for clustering diagnosis codes from the International Classification of Diseases (ICD) according to the similarity in relationships between covariates and readmission risk. CoFA measures the similarity between diagnoses by the frequency with which two diagnoses are split in the same direction versus split apart in random forests to predict readmission risk. Applying CoFA to de-identified data from Berkshire Medical Center, we identified three groups of diagnoses that vary in readmission risk. To evaluate CoFA, we compared readmission risk models using ICD majors and CoFA groups to a baseline model without diagnosis variables. We found substituting ICD majors for the CoFA-identified clusters simplified the model without compromising the accuracy of predictions. Fitting separate models for each ICD major and CoFA group did not improve predictions, suggesting that readmission risk may be more homogeneous that heterogeneous across diagnosis groups.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
Authors:
Eric Wong,
Frank R. Schmidt,
J. Zico Kolter
Abstract:
A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $\ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification se…
▽ More
A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $\ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification setting, such distances measure the cost of moving pixel mass, which naturally cover "standard" image manipulations such as scaling, rotation, translation, and distortion (and can potentially be applied to other settings as well). To generate Wasserstein adversarial examples, we develop a procedure for projecting onto the Wasserstein ball, based upon a modified version of the Sinkhorn iteration. The resulting algorithm can successfully attack image classification models, bringing traditional CIFAR10 models down to 3% accuracy within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1 pixel), and we demonstrate that PGD-based adversarial training can improve this adversarial accuracy to 76%. In total, this work opens up a new direction of study in adversarial robustness, more formally considering convex metrics that accurately capture the invariances that we typically believe should exist in classifiers. Code for all experiments in the paper is available at https://github.com/locuslab/projected_sinkhorn.
△ Less
Submitted 18 January, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
Self Configuration in Machine Learning
Authors:
Eugene Wong
Abstract:
In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input laye…
▽ More
In this paper we first present a class of algorithms for training multi-level neural networks with a quadratic cost function one layer at a time starting from the input layer. The algorithm is based on the fact that for any layer to be trained, the effect of a direct connection to an optimized linear output layer can be computed without the connection being made. Thus, starting from the input layer, we can train each layer in succession in isolation from the other layers. Once trained, the weights are kept fixed and the outputs of the trained layer then serve as the inputs to the next layer to be trained. The result is a very fast algorithm. The simplicity of this training arrangement allows the activation function and step size in weight adjustment to be adaptive and self-adjusting. Furthermore, the stability of the training process allows relatively large steps to be taken and thereby achieving in even greater speeds. Finally, in our context configuring the network means determining the number of outputs for each layer. By decomposing the overall cost function into separate components related to approximation and estimation, we obtain an optimization formula for determining the number of outputs for each layer. With the ability to self-configure and set parameters, we now have more than a fast training algorithm, but the ability to build automatically a fully trained deep neural network starting with nothing more than data.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
An Integration and Assessment of Covariates of Nonstationary Storm Surge Statistical Behavior by Bayesian Model Averaging
Authors:
Tony E. Wong
Abstract:
Projections of storm surge return levels are a basic requirement for effective management of coastal risks. A common approach to estimate hazards posed by extreme sea levels is to use a statistical model, which may use a time series of a climate variable as a covariate to modulate the statistical model and account for potentially nonstationary storm surge behavior. Previous work using nonstationar…
▽ More
Projections of storm surge return levels are a basic requirement for effective management of coastal risks. A common approach to estimate hazards posed by extreme sea levels is to use a statistical model, which may use a time series of a climate variable as a covariate to modulate the statistical model and account for potentially nonstationary storm surge behavior. Previous work using nonstationary statistical approaches, however, has demonstrated the importance of accounting for the many inherent modeling uncertainties. Additionally, previous assessments of coastal flood hazard using statistical modeling have typically relied on a single climate covariate, which likely leaves out important processes and leads to potential biases. Here, I employ upon a recently developed approach to integrate stationary and nonstationary statistical models, and examine the effects of choice of covariate time series on projected flood hazard. Furthermore, I expand upon this approach by develo** a nonstationary storm surge statistical model that makes use of multiple covariate time series: global mean temperature, sea level, North Atlantic Oscillation index and time. I show that a storm surge model that accounts for additional processes raises the projected 100-year storm surge return level by up to 23 centimeters relative to a stationary model or one that employs a single covariate time series. I find that the total marginal model likelihood associated with each set of nonstationary models given by the candidate covariates, as well as a stationary model, is about 20%. These results shed light on how best to account for potential nonstationary coastal surge behavior, and incorporate more processes into surge projections. By including a wider range of physical process information and considering nonstationary behavior, these methods will better enable modeling efforts to inform coastal risk management.
△ Less
Submitted 25 August, 2018; v1 submitted 20 August, 2018;
originally announced August 2018.
-
Scaling provable adversarial defenses
Authors:
Eric Wong,
Frank R. Schmidt,
Jan Hendrik Metzen,
J. Zico Kolter
Abstract:
Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique f…
▽ More
Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the specific case of $\ell_\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\ell_\infty$ perturbations of $ε=0.1$), and from 80% to 36.4% on CIFAR (with $\ell_\infty$ perturbations of $ε=2/255$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.
△ Less
Submitted 21 November, 2018; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Neglecting Model Structural Uncertainty Underestimates Upper Tails of Flood Hazard
Authors:
Tony E. Wong,
Alexandra Klufas,
Vivek Srikrishnan,
Klaus Keller
Abstract:
Coastal flooding drives considerable risks to many communities, but projections of future flood risks are deeply uncertain. The paucity of observations of extreme events often motivates the use of statistical approaches to model the distribution of extreme storm surge events. A key deep uncertainty that is often overlooked is model structural uncertainty. There is currently no strong consensus amo…
▽ More
Coastal flooding drives considerable risks to many communities, but projections of future flood risks are deeply uncertain. The paucity of observations of extreme events often motivates the use of statistical approaches to model the distribution of extreme storm surge events. A key deep uncertainty that is often overlooked is model structural uncertainty. There is currently no strong consensus among experts regarding which class of statistical model to use as a best practice. Robust management of coastal flooding risks requires coastal managers to consider the distinct possibility of non-stationarity in storm surges. This increases the complexity of the potential models to use, which tends to increase the data required to constrain the model. Here, we use a Bayesian model averaging approach to analyze the balance between model complexity sufficient to capture decision-relevant risks and data availability to constrain complex model structures. We characterize deep model structural uncertainty through a set of calibration experiments. Specifically, we calibrate a set of models ranging in complexity using long-term tide gauge observations from the Netherlands and the United States. We find that in both cases, roughly half the model weight is associated with non-stationary models. Our approach provides a formal framework to integrate information across model structures, in light of the potentially sizable modeling uncertainties. By combining information from multiple models, our inference sharpens for the projected storm surge 100-year return levels, and estimated return levels increase by several centimeters. We assess the impacts of data availability through a set of experiments with temporal subsets and model comparison metrics. Our analysis suggests about 70 years of data are required to stabilize estimates of the 100-year return level, for the locations and methods considered here.
△ Less
Submitted 3 June, 2018; v1 submitted 25 September, 2017;
originally announced September 2017.