Skip to main content

Showing 1–6 of 6 results for author: Barron, A R

Searching in archive stat. Search in all archives.
.
  1. arXiv:1902.00800  [pdf, ps, other

    stat.ML cs.LG

    Complexity, Statistical Risk, and Metric Entropy of Deep Nets Using Total Path Variation

    Authors: Andrew R. Barron, Jason M. Klusowski

    Abstract: For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly $1$, and the input layer variables are multiplied by a value $V$ coinciding with the total variation of the path weights. Implications are given for Gaussian complexity, Rademacher complexity, statistical risk, and metric entropy, all of which are shown to be proportional… ▽ More

    Submitted 6 February, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

  2. arXiv:1809.03090  [pdf, ps, other

    stat.ML cs.LG

    Approximation and Estimation for High-Dimensional Deep Learning Networks

    Authors: Andrew R. Barron, Jason M. Klusowski

    Abstract: It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations. Is there a theoretical basis for this? The best available bounds on their metric entropy and associated complexity measures are essentially linear in the number of parameters, which is inadequate to ex… ▽ More

    Submitted 18 September, 2018; v1 submitted 9 September, 2018; originally announced September 2018.

  3. arXiv:1702.02828  [pdf, ps, other

    stat.ML cs.LG

    Minimax Lower Bounds for Ridge Combinations Including Neural Nets

    Authors: Jason M. Klusowski, Andrew R. Barron

    Abstract: Estimation of functions of $ d $ variables is considered using ridge combinations of the form $ \textstyle\sum_{k=1}^m c_{1,k} φ(\textstyle\sum_{j=1}^d c_{0,j,k}x_j-b_k) $ where the activation function $ φ$ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size $ n $ of possibly noisy values at r… ▽ More

    Submitted 9 February, 2017; originally announced February 2017.

    MSC Class: 62J02; 62G08; 68T05

  4. arXiv:1607.07819  [pdf, ps, other

    stat.ML math.ST

    Approximation by Combinations of ReLU and Squared ReLU Ridge Functions with $ \ell^1 $ and $ \ell^0 $ Controls

    Authors: Jason M. Klusowski, Andrew R. Barron

    Abstract: We establish $ L^{\infty} $ and $ L^2 $ error bounds for functions of many variables that are approximated by linear combinations of ReLU (rectified linear unit) and squared ReLU ridge functions with $ \ell^1 $ and $ \ell^0 $ controls on their inner and outer parameters. With the squared ReLU ridge function, we show that the $ L^2 $ approximation error is inversely proportional to the inner layer… ▽ More

    Submitted 23 May, 2018; v1 submitted 26 July, 2016; originally announced July 2016.

    MSC Class: 62M45; 41A15

  5. arXiv:1607.01434  [pdf, ps, other

    math.ST stat.ML

    Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

    Authors: Jason M. Klusowski, Andrew R. Barron

    Abstract: Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq \left(v^4_{f^{\star}}\frac{\log d}{n}\right)^{1/3} $, where $ n $ is the sample size and $ \hat{f} $ is either a penalized least squares estimator or a greedily obtained version of such using linear combina… ▽ More

    Submitted 29 October, 2018; v1 submitted 5 July, 2016; originally announced July 2016.

    Comments: Submitted to Annals of Statistics

    MSC Class: 62J02; 62G08; 68T05

  6. arXiv:1401.3760  [pdf, other

    cs.IT stat.ME

    Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting

    Authors: Xiao Yang, Andrew R. Barron

    Abstract: This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size $m$. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal w… ▽ More

    Submitted 15 January, 2014; originally announced January 2014.