Search | arXiv e-print repository

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

Authors: Aakash Lahoti, Stefani Karp, Ezra Winston, Aarti Singh, Yuanzhi Li

Abstract: Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural net… ▽ More Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either they disregard the optimizer and only provide uniform convergence upper bounds with no separating lower bounds, or they consider simplistic tasks that do not truly mirror the locality and translation invariance as found in real-world vision tasks. To address these deficiencies, we introduce the Dynamic Signal Distribution (DSD) classification task that models an image as consisting of $k$ patches, each of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. On this task, for any orthogonally equivariant algorithm like gradient descent, we prove that CNNs require $\tilde{O}(k+d)$ samples, whereas LCNs require $Ω(kd)$ samples, establishing the statistical advantages of weight sharing in translation invariant tasks. Furthermore, LCNs need $\tilde{O}(k(k+d))$ samples, compared to $Ω(k^2d)$ samples for FCNs, showcasing the benefits of locality in local tasks. Additionally, we develop information theoretic tools for analyzing randomized algorithms, which may be of interest for statistical research. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 40 pages, 4 figures, Accepted to ICLR 2024, Spotlight

arXiv:2006.08591 [pdf, other]

Monotone operator equilibrium networks

Authors: Ezra Winston, J. Zico Kolter

Abstract: Implicit-depth models such as Deep Equilibrium Networks have recently been shown to match or exceed the performance of traditional deep networks while being much more memory efficient. However, these models suffer from unstable convergence to a solution and lack guarantees that a solution exists. On the other hand, Neural ODEs, another class of implicit-depth models, do guarantee existence of a un… ▽ More Implicit-depth models such as Deep Equilibrium Networks have recently been shown to match or exceed the performance of traditional deep networks while being much more memory efficient. However, these models suffer from unstable convergence to a solution and lack guarantees that a solution exists. On the other hand, Neural ODEs, another class of implicit-depth models, do guarantee existence of a unique solution but perform poorly compared with traditional networks. In this paper, we develop a new class of implicit-depth model based on the theory of monotone operators, the Monotone Operator Equilibrium Network (monDEQ). We show the close connection between finding the equilibrium point of an implicit network and solving a form of monotone operator splitting problem, which admits efficient solvers with guaranteed, stable convergence. We then develop a parameterization of the network which ensures that all operators remain monotone, which guarantees the existence of a unique equilibrium point. Finally, we show how to instantiate several versions of these models, and implement the resulting iterative solvers, for structured linear operators such as multi-scale convolutions. The resulting models vastly outperform the Neural ODE-based models while also being more computationally efficient. Code is available at http://github.com/locuslab/monotone_op_net. △ Less

Submitted 3 May, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020

arXiv:2002.03018 [pdf, other]

Certified Robustness to Label-Flip** Attacks via Randomized Smoothing

Authors: Elan Rosenfeld, Ezra Winston, Pradeep Ravikumar, J. Zico Kolter

Abstract: Machine learning algorithms are known to be susceptible to data poisoning attacks, where an adversary manipulates the training data to degrade performance of the resulting classifier. In this work, we present a unifying view of randomized smoothing over arbitrary functions, and we leverage this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably… ▽ More Machine learning algorithms are known to be susceptible to data poisoning attacks, where an adversary manipulates the training data to degrade performance of the resulting classifier. In this work, we present a unifying view of randomized smoothing over arbitrary functions, and we leverage this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks. As a specific instantiation, we utilize our framework to build linear classifiers that are robust to a strong variant of label flip**, where each test example is targeted independently. In other words, for each test point, our classifier includes a certification that its prediction would be the same had some number of training labels been changed adversarially. Randomized smoothing has previously been used to guarantee---with high probability---test-time robustness to adversarial manipulation of the input to a classifier; we derive a variant which provides a deterministic, analytical bound, sidestep** the probabilistic certificates that traditionally result from the sampling subprocedure. Further, we obtain these certified bounds with minimal additional runtime complexity over standard classification and no assumptions on the train or test distributions. We generalize our results to the multi-class case, providing the first multi-class classification algorithm that is certifiably robust to label-flip** attacks. △ Less

Submitted 11 August, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: ICML 2020

arXiv:1903.01689 [pdf, other]

Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment

Authors: Yifan Wu, Ezra Winston, Divyansh Kaushik, Zachary Lipton

Abstract: Domain adaptation addresses the common problem when the target distribution generating our test data drifts from the source (training) distribution. While absent assumptions, domain adaptation is impossible, strict conditions, e.g. covariate or label shift, enable principled algorithms. Recently-proposed domain-adversarial approaches consist of aligning source and target encodings, often motivatin… ▽ More Domain adaptation addresses the common problem when the target distribution generating our test data drifts from the source (training) distribution. While absent assumptions, domain adaptation is impossible, strict conditions, e.g. covariate or label shift, enable principled algorithms. Recently-proposed domain-adversarial approaches consist of aligning source and target encodings, often motivating this approach as minimizing two (of three) terms in a theoretical bound on target error. Unfortunately, this minimization can cause arbitrary increases in the third term, e.g. they can break down under shifting label distributions. We propose asymmetrically-relaxed distribution alignment, a new approach that overcomes some limitations of standard domain-adversarial algorithms. Moreover, we characterize precise assumptions under which our algorithm is theoretically principled and demonstrate empirical benefits on both synthetic and real datasets. △ Less

Submitted 11 March, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

arXiv:1705.00634 [pdf, other]

Counterfactual-based Incrementality Measurement in a Digital Ad-Buying Platform

Authors: Prasad Chalasani, Ari Buchalter, Jaynth Thiagarajan, Ezra Winston

Abstract: The problem of measuring the true incremental effectiveness of a digital advertising campaign is of increasing importance to marketers. With a large and increasing percentage of digital advertising delivered via Demand-Side-Platforms (DSPs) executing campaigns via Real-Time-Bidding (RTB) auctions and programmatic approaches, a measurement solution that satisfies both advertiser concerns and the co… ▽ More The problem of measuring the true incremental effectiveness of a digital advertising campaign is of increasing importance to marketers. With a large and increasing percentage of digital advertising delivered via Demand-Side-Platforms (DSPs) executing campaigns via Real-Time-Bidding (RTB) auctions and programmatic approaches, a measurement solution that satisfies both advertiser concerns and the constraints of a DSP is of particular interest. MediaMath (a DSP) has developed the first practical, statistically sound randomization-based methodology for causal ad effectiveness (or Ad Lift) measurement by a DSP (or similar digital advertising execution system that may not have full control over the advertising transaction mechanisms). We describe our solution and establish its soundness within the causal framework of counterfactuals and potential outcomes, and present a Gibbs-sampling procedure for estimating confidence intervals around the estimated Ad Lift. We also address practical complications (unique to the digital advertising setting) that stem from the fact that digital advertising is targeted and measured via identifiers (e.g., cookies, mobile advertising IDs) that may not be stable over time. One such complication is the repeated occurrence of identifiers, leading to interference among observations. Another is due to the possibility of multiple identifiers being associated with the same consumer, leading to "contamination" with some of their identifiers being assigned to the Treatment group and others to the Control group. Complications such as these have severely impaired previous efforts to derive accurate measurements of lift in practice. In contrast to a few other papers on the subject, this paper has an expository aim as well, and provides a rigorous, self-contained, and readily-implementable treatment of all relevant concepts. △ Less

Submitted 3 May, 2017; v1 submitted 1 May, 2017; originally announced May 2017.

Comments: 44 pages, 6 figures, 1 table

Showing 1–5 of 5 results for author: Winston, E