Search | arXiv e-print repository

Hebbian learning inspired estimation of the linear regression parameters from queries

Authors: Johannes Schmidt-Hieber, Wouter M Koolen

Abstract: Local learning rules in biological neural networks (BNNs) are commonly referred to as Hebbian learning. [26] links a biologically motivated Hebbian learning rule to a specific zeroth-order optimization method. In this work, we study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. Zeroth-order optimization methods are known to converge with… ▽ More Local learning rules in biological neural networks (BNNs) are commonly referred to as Hebbian learning. [26] links a biologically motivated Hebbian learning rule to a specific zeroth-order optimization method. In this work, we study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. Zeroth-order optimization methods are known to converge with suboptimal rate for large parameter dimension compared to first-order methods like gradient descent, and are therefore thought to be in general inferior. By establishing upper and lower bounds, we show, however, that such methods achieve near-optimal rates if only queries of the linear regression loss are available. Moreover, we prove that this Hebbian learning rule can achieve considerably faster rates than any non-adaptive method that selects the queries independently of the data. △ Less

Submitted 26 September, 2023; originally announced November 2023.

Comments: 34 pages

MSC Class: Primary: 62L20; secondary: 62J05

arXiv:2309.15001 [pdf, other]

doi 10.1016/j.jspi.2024.106174

Convergence guarantees for forward gradient descent in the linear regression model

Authors: Thijs Bos, Johannes Schmidt-Hieber

Abstract: Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters a… ▽ More Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters and k the number of samples, we prove that the mean squared error of this method converges for $k\gtrsim d^2\log(d)$ with rate $d^2\log(d)/k.$ Compared to the dimension dependence d for stochastic gradient descent, an additional factor $d\log(d)$ occurs. △ Less

Submitted 20 June, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: 17 pages

MSC Class: Primary: 62L20; secondary: 62J05

Journal ref: Journal of Statistical Planning and Inference, Volume 233, 106174, 2024

arXiv:2306.10529 [pdf, other]

Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model

Authors: Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

Abstract: We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, ow… ▽ More We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator △ Less

Submitted 25 April, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 52 pages, 2 figures

arXiv:2306.10471 [pdf, other]

A supervised deep learning method for nonparametric density estimation

Authors: Thijs Bos, Johannes Schmidt-Hieber

Abstract: Nonparametric density estimation is an unsupervised learning problem. In this work we propose a two-step procedure that casts the density estimation problem in the first step into a supervised regression problem. The advantage is that we can afterwards apply supervised learning methods. Compared to the standard nonparametric regression setting, the proposed procedure creates, however, dependence a… ▽ More Nonparametric density estimation is an unsupervised learning problem. In this work we propose a two-step procedure that casts the density estimation problem in the first step into a supervised regression problem. The advantage is that we can afterwards apply supervised learning methods. Compared to the standard nonparametric regression setting, the proposed procedure creates, however, dependence among the training samples. To derive statistical risk bounds, one can therefore not rely on the well-developed theory for i.i.d. data. To overcome this, we prove an oracle inequality for this specific form of data dependence. As an application, it is shown that under a compositional structure assumption on the underlying density, the proposed two-step method achieves convergence rates that are faster than the standard nonparametric rates. A simulation study illustrates the finite sample performance. △ Less

Submitted 3 June, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: Keywords: Neural networks, nonparametric density estimation, statistical estimation rates, (un)supervised learning

MSC Class: Primary: 62G07; secondary 68T07

arXiv:2303.11706 [pdf, ps, other]

doi 10.1016/j.spl.2024.110182

Lower bounds for the trade-off between bias and mean absolute deviation

Authors: Alexis Derumigny, Johannes Schmidt-Hieber

Abstract: In nonparametric statistics, rate-optimal estimators typically balance bias and stochastic error. The recent work on overparametrization raises the question whether rate-optimal estimators exist that do not obey this trade-off. In this work we consider pointwise estimation in the Gaussian white noise model with regression function $f$ in a class of $β$-Hölder smooth functions. Let 'worst-case' ref… ▽ More In nonparametric statistics, rate-optimal estimators typically balance bias and stochastic error. The recent work on overparametrization raises the question whether rate-optimal estimators exist that do not obey this trade-off. In this work we consider pointwise estimation in the Gaussian white noise model with regression function $f$ in a class of $β$-Hölder smooth functions. Let 'worst-case' refer to the supremum over all functions $f$ in the Hölder class. It is shown that any estimator with worst-case bias $\lesssim n^{-β/(2β+1)}=: ψ_n$ must necessarily also have a worst-case mean absolute deviation that is lower bounded by $\gtrsim ψ_n.$ To derive the result, we establish abstract inequalities relating the change of expectation for two probability measures to the mean absolute deviation. △ Less

Submitted 20 June, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: This is an extended version of Section 7 of arXiv:2006.00278v3. The material has been removed from later versions of arXiv:2006.00278

MSC Class: 62C20; 62G05; 62C05

Journal ref: Statistics and Probability Letters, Volume 213, 110182, 2024

arXiv:2303.08122 [pdf, ps, other]

Codivergences and information matrices

Authors: Alexis Derumigny, Johannes Schmidt-Hieber

Abstract: We propose a new concept of codivergence, which quantifies the similarity between two probability measures $P_1, P_2$ relative to a reference probability measure $P_0$. In the neighborhood of the reference measure $P_0$, a codivergence behaves like an inner product between the measures $P_1 - P_0$ and $P_2 - P_0$. Codivergences of covariance-type and correlation-type are introduced and studied wit… ▽ More We propose a new concept of codivergence, which quantifies the similarity between two probability measures $P_1, P_2$ relative to a reference probability measure $P_0$. In the neighborhood of the reference measure $P_0$, a codivergence behaves like an inner product between the measures $P_1 - P_0$ and $P_2 - P_0$. Codivergences of covariance-type and correlation-type are introduced and studied with a focus on two specific correlation-type codivergences, the $χ^2$-codivergence and the Hellinger codivergence. We derive explicit expressions for several common parametric families of probability distributions. For a codivergence, we introduce moreover the divergence matrix as an analogue of the Gram matrix. It is shown that the $χ^2$-divergence matrix satisfies a data-processing inequality. △ Less

Submitted 9 May, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 30 pages, 1 figure, 1 table. This is an extended version of Section 2.2 of arXiv:2006.00278v3 (most of this content has been removed in the next version (arXiv:2006.00278v4) and link to this separate paper instead)

MSC Class: 62B11; 46E27; 15A63

arXiv:2301.11777 [pdf, other]

Interpreting learning in biological neural networks as zero-order optimization method

Authors: Johannes Schmidt-Hieber

Abstract: Recently, significant progress has been made regarding the statistical understanding of artificial neural networks (ANNs). ANNs are motivated by the functioning of the brain, but differ in several crucial aspects. In particular, the locality in the updating rule of the connection parameters in biological neural networks (BNNs) makes it biologically implausible that the learning of the brain is bas… ▽ More Recently, significant progress has been made regarding the statistical understanding of artificial neural networks (ANNs). ANNs are motivated by the functioning of the brain, but differ in several crucial aspects. In particular, the locality in the updating rule of the connection parameters in biological neural networks (BNNs) makes it biologically implausible that the learning of the brain is based on gradient descent. In this work, we look at the brain as a statistical method for supervised learning. The main contribution is to relate the local updating rule of the connection parameters in BNNs to a zero-order optimization method. It is shown that the expected values of the iterates implement a modification of gradient descent. △ Less

Submitted 23 March, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2206.02151 [pdf, other]

A statistical analysis of an image classification problem

Authors: Sophie Langer, Johannes Schmidt-Hieber

Abstract: The availability of massive image databases resulted in the development of scalable machine learning methods such as convolutional neural network (CNNs) filtering and processing these data. While the very recent theoretical work on CNNs focuses on standard nonparametric denoising problems, the variability in image classification datasets does, however, not originate from additive noise but from va… ▽ More The availability of massive image databases resulted in the development of scalable machine learning methods such as convolutional neural network (CNNs) filtering and processing these data. While the very recent theoretical work on CNNs focuses on standard nonparametric denoising problems, the variability in image classification datasets does, however, not originate from additive noise but from variation of the shape and other characteristics of the same object across different images. To address this problem, we consider a simple supervised classification problem for object detection on grayscale images. While from the function estimation point of view, every pixel is a variable and large images lead to high-dimensional function recovery tasks suffering from the curse of dimensionality, increasing the number of pixels in our image deformation model enhances the image resolution and makes the object classification problem easier. We propose and theoretically analyze two different procedures. The first method estimates the image deformation by support alignment. Under a minimal separation condition, it is shown that perfect classification is possible. The second method fits a CNN to the data. We derive a rate for the misclassification error depending on the sample size and the number of pixels. Both classifiers are empirically compared on images generated from the MNIST handwritten digit database. The obtained results corroborate the theoretical findings. △ Less

Submitted 5 June, 2022; originally announced June 2022.

arXiv:2205.07764 [pdf, ps, other]

On the inability of Gaussian process regression to optimally learn compositional functions

Authors: Matteo Giordano, Kolyan Ray, Johannes Schmidt-Hieber

Abstract: We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on… ▽ More We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$. △ Less

Submitted 27 September, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: 20 pages, to appear in Advances in Neural Information Processing Systems 36 (NeurIPS 2022)

arXiv:2204.05003 [pdf, other]

Local convergence rates of the nonparametric least squares estimator with applications to transfer learning

Authors: Johannes Schmidt-Hieber, Petr Zamolodtchikov

Abstract: Convergence properties of empirical risk minimizers can be conveniently expressed in terms of the associated population risk. To derive bounds for the performance of the estimator under covariate shift, however, pointwise convergence rates are required. Under weak assumptions on the design distribution, it is shown that least squares estimators (LSE) over 1-Lipschitz functions are also minimax rat… ▽ More Convergence properties of empirical risk minimizers can be conveniently expressed in terms of the associated population risk. To derive bounds for the performance of the estimator under covariate shift, however, pointwise convergence rates are required. Under weak assumptions on the design distribution, it is shown that least squares estimators (LSE) over 1-Lipschitz functions are also minimax rate optimal with respect to a weighted uniform norm, where the weighting accounts in a natural way for the non-uniformity of the design distribution. This implies that although least squares is a global criterion, the LSE adapts locally to the size of the design density. We develop a new indirect proof technique that establishes the local convergence behavior based on a carefully chosen local perturbation of the LSE. The obtained local rates are then applied to analyze the LSE for transfer learning under covariate shift. △ Less

Submitted 29 December, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2201.04545 [pdf, other]

doi 10.1109/TIT.2022.3215088

On generalization bounds for deep networks based on loss surface implicit regularization

Authors: Masaaki Imaizumi, Johannes Schmidt-Hieber

Abstract: The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by sto… ▽ More The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima. △ Less

Submitted 16 October, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: To appear in IEEE Transaction on Information Theory

arXiv:2108.00969 [pdf, ps, other]

Convergence rates of deep ReLU networks for multiclass classification

Authors: Thijs Bos, Johannes Schmidt-Hieber

Abstract: For classification problems, trained deep neural networks return probabilities of class memberships. In this work we study convergence of the learned probabilities to the true conditional class probabilities. More specifically we consider sparse deep ReLU network reconstructions minimizing cross-entropy loss in the multiclass classification setup. Interesting phenomena occur when the class members… ▽ More For classification problems, trained deep neural networks return probabilities of class memberships. In this work we study convergence of the learned probabilities to the true conditional class probabilities. More specifically we consider sparse deep ReLU network reconstructions minimizing cross-entropy loss in the multiclass classification setup. Interesting phenomena occur when the class membership probabilities are close to zero. Convergence rates are derived that depend on the near-zero behaviour via a margin-type condition. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: convergence rates, ReLU networks, multiclass classification, conditional class probabilities, margin condition

MSC Class: Primary: 62G05; secondary: 63H30; 68T07

arXiv:2105.07410 [pdf, other]

Posterior contraction for deep Gaussian process priors

Authors: Gianluca Finocchio, Johannes Schmidt-Hieber

Abstract: We study posterior contraction rates for a class of deep Gaussian process priors applied to the nonparametric regression problem under a general composition assumption on the regression function. It is shown that the contraction rates can achieve the minimax convergence rate (up to $\log n$ factors), while being adaptive to the underlying structure and smoothness of the target function. The propos… ▽ More We study posterior contraction rates for a class of deep Gaussian process priors applied to the nonparametric regression problem under a general composition assumption on the regression function. It is shown that the contraction rates can achieve the minimax convergence rate (up to $\log n$ factors), while being adaptive to the underlying structure and smoothness of the target function. The proposed framework extends the Bayesian nonparametric theory for Gaussian process priors. △ Less

Submitted 13 August, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

Comments: 56 pages, 3 figures

MSC Class: 62G08; 62G20 (Primary) 62C20; 62R07 (Secondary)

arXiv:2007.15884 [pdf, other]

The Kolmogorov-Arnold representation theorem revisited

Authors: Johannes Schmidt-Hieber

Abstract: There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of… ▽ More There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of the main obstacles is that the outer function depends on the represented function and can be wildly varying even if the represented function is smooth. We derive modifications of the Kolmogorov-Arnold representation that transfer smoothness properties of the represented function to the outer function and can be well approximated by ReLU networks. It appears that instead of two hidden layers, a more natural interpretation of the Kolmogorov-Arnold representation is that of a deep neural network where most of the layers are required to approximate the interior function. △ Less

Submitted 2 January, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

Comments: 21 pages

MSC Class: 41A30

arXiv:2006.00278 [pdf, ps, other]

On lower bounds for the bias-variance trade-off

Authors: Alexis Derumigny, Johannes Schmidt-Hieber

Abstract: It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a… ▽ More It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or $χ^2$-divergence. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. In the Gaussian sequence model, different phase transitions of the bias-variance trade-off occur. Although there is a non-trivial interplay between bias and variance, the rate of the squared bias and the variance do not have to be balanced in order to achieve the minimax estimation rate. △ Less

Submitted 20 March, 2023; v1 submitted 30 May, 2020; originally announced June 2020.

Comments: 52 pages, 2 figures, 1 table

MSC Class: 62G05; 62C05; 62C20

arXiv:2003.04406 [pdf, other]

On frequentist coverage of Bayesian credible sets for estimation of the mean under constraints

Authors: Kevin Duisters, Johannes Schmidt-Hieber

Abstract: Frequentist coverage of $(1-α)$-highest posterior density (HPD) credible sets is studied in a signal plus noise model under a large class of noise distributions. We consider a specific class of spike-and-slab prior distributions. Different regimes are identified and we derive closed form expressions for the $(1-α)$-HPD on each of these regimes. Similar to the earlier work by Marchand and Strawderm… ▽ More Frequentist coverage of $(1-α)$-highest posterior density (HPD) credible sets is studied in a signal plus noise model under a large class of noise distributions. We consider a specific class of spike-and-slab prior distributions. Different regimes are identified and we derive closed form expressions for the $(1-α)$-HPD on each of these regimes. Similar to the earlier work by Marchand and Strawderman, it is shown that under suitable conditions, the frequentist coverage can drop to $1-3α/2.$ △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: 35 pages, 5 figures

MSC Class: 62C10; 62G15; 62F15

arXiv:1908.00695 [pdf, other]

Deep ReLU network approximation of functions on a manifold

Authors: Johannes Schmidt-Hieber

Abstract: Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a Hölder function with s… ▽ More Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a Hölder function with smoothness index $β$ up to error $ε$ using of the order of $ε^{-d^*/β}\log(1/ε)$ many non-zero network parameters. As an application, we derive statistical convergence rates for the estimator minimizing the empirical risk over all possible choices of bounded network parameters. △ Less

Submitted 2 August, 2019; originally announced August 2019.

arXiv:1904.04525 [pdf, ps, other]

Bayesian variance estimation in the Gaussian sequence model with partial information on the means

Authors: Gianluca Finocchio, Johannes Schmidt-Hieber

Abstract: Consider the Gaussian sequence model under the additional assumption that a fixed fraction of the means is known. We study the problem of variance estimation from a frequentist Bayesian perspective. The maximum likelihood estimator (MLE) for $σ^2$ is biased and inconsistent. This raises the question whether the posterior is able to correct the MLE in this case. By develo** a new proving strategy… ▽ More Consider the Gaussian sequence model under the additional assumption that a fixed fraction of the means is known. We study the problem of variance estimation from a frequentist Bayesian perspective. The maximum likelihood estimator (MLE) for $σ^2$ is biased and inconsistent. This raises the question whether the posterior is able to correct the MLE in this case. By develo** a new proving strategy that uses refined properties of the posterior distribution, we find that the marginal posterior is inconsistent for any i.i.d. prior on the mean parameters. In particular, no assumption on the decay of the prior needs to be imposed. Surprisingly, we also find that consistency can be retained for a hierarchical prior based on Gaussian mixtures. In this case we also establish a limiting shape result and determine the limit distribution. In contrast to the classical Bernstein-von Mises theorem, the limit is non-Gaussian. We show that the Bayesian analysis leads to new statistical estimators outperforming the correctly calibrated MLE in a numerical simulation study. △ Less

Submitted 18 December, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: 33 pages, 1 table, corrected typos, improved proofs, expanded sections, references added

arXiv:1809.04140 [pdf, other]

Nonparametric Bayesian analysis of the compound Poisson prior for support boundary recovery

Authors: Markus Reiss, Johannes Schmidt-Hieber

Abstract: Given data from a Poisson point process with intensity $(x,y) \mapsto n \mathbf{1}(f(x)\leq y),$ frequentist properties for the Bayesian reconstruction of the support boundary function $f$ are derived. We mainly study compound Poisson process priors with fixed intensity proving that the posterior contracts with nearly optimal rate for monotone and piecewise constant support boundaries and adapts t… ▽ More Given data from a Poisson point process with intensity $(x,y) \mapsto n \mathbf{1}(f(x)\leq y),$ frequentist properties for the Bayesian reconstruction of the support boundary function $f$ are derived. We mainly study compound Poisson process priors with fixed intensity proving that the posterior contracts with nearly optimal rate for monotone and piecewise constant support boundaries and adapts to Hölder smooth boundaries with smoothness index at most one. We then derive a non-standard Bernstein-von Mises result for a compound Poisson process prior and a function space with increasing parameter dimension. As an intermediate result the limiting shape of the posterior for random histogram type priors is obtained. In both settings, it is shown that the marginal posterior of the functional $\vartheta =\int f$ performs an automatic bias correction and contracts with a faster rate than the MLE. In this case, $(1-α)$-credible sets are also asymptotic $(1-α)$-confidence intervals. As a negative result, it is shown that the frequentist coverage of credible sets is lost for linear functions indicating that credible sets only have frequentist coverage for priors that are specifically constructed to match properties of the underlying true function. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: The first version of arXiv:1703.08358 has been expanded and rewritten. We decided to split it in two separate papers, a new version of arXiv:1703.08358 and this article

MSC Class: 62C10; 62G05; 60G55

arXiv:1809.02443 [pdf, other]

Posterior analysis of $n$ in the binomial $(n,p)$ problem with both parameters unknown -- with applications to quantitative nanoscopy

Authors: Johannes Schmidt-Hieber, Laura Fee Schneider, Thomas Staudt, Andrea Kra**a, Timo Aspelmeier, Axel Munk

Abstract: Estimation of the population size $n$ from $k$ i.i.d.\ binomial observations with unknown success probability $p$ is relevant to a multitude of applications and has a long history. Without additional prior information this is a notoriously difficult task when $p$ becomes small, and the Bayesian approach becomes particularly useful. For a large class of priors, we establish posterior contraction an… ▽ More Estimation of the population size $n$ from $k$ i.i.d.\ binomial observations with unknown success probability $p$ is relevant to a multitude of applications and has a long history. Without additional prior information this is a notoriously difficult task when $p$ becomes small, and the Bayesian approach becomes particularly useful. For a large class of priors, we establish posterior contraction and a Bernstein-von Mises type theorem in a setting where $p\rightarrow0$ and $n\rightarrow\infty$ as $k\to\infty$. Furthermore, we suggest a new class of Bayesian estimators for $n$ and provide a comprehensive simulation study in which we investigate their performance. To showcase the advantages of a Bayesian approach on real data, we also benchmark our estimators in a novel application from super-resolution microscopy. △ Less

Submitted 16 November, 2020; v1 submitted 7 September, 2018; originally announced September 2018.

Comments: 66 pages; 37 pages main text and 29 pages supplement; contains link to a supplementary microscopy video

arXiv:1804.02253 [pdf, other]

A comparison of deep networks with ReLU activation function and linear spline-type methods

Authors: Konstantin Eckle, Johannes Schmidt-Hieber

Abstract: Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline met… ▽ More Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with $M$ parameters there exists a multilayer neural network with $O(M \log (M/\varepsilon))$ parameters that approximates this function up to sup-norm error $\varepsilon.$ We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations. △ Less

Submitted 24 September, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

MSC Class: 62G08 (Primary); 62G20 (Secondary)

arXiv:1802.03425 [pdf, ps, other]

doi 10.1214/18-AIHP946

Asymptotic nonequivalence of density estimation and Gaussian white noise for small densities

Authors: Kolyan Ray, Johannes Schmidt-Hieber

Abstract: It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities are sufficiently smooth and uniformly bounded away from zero. We show that a uniform lower bound, whose size we sharply characterize, is in general necessary for asymptotic equivalence to hold. It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities are sufficiently smooth and uniformly bounded away from zero. We show that a uniform lower bound, whose size we sharply characterize, is in general necessary for asymptotic equivalence to hold. △ Less

Submitted 6 November, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

Comments: 20 pages, 1 figure. Some results from an early version of arXiv:1608.01824 are now found here

MSC Class: 62B15 (Primary); 62G07; 62G10; 62G20 (Secondary)

Journal ref: Ann. Inst. H. Poincare Probab. Statist. 55 (2019), no. 4, 2195-2208

arXiv:1708.06633 [pdf, other]

doi 10.1214/19-AOS1875

Nonparametric regression using deep neural networks with ReLU activation function

Authors: Johannes Schmidt-Hieber

Abstract: Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constrain… ▽ More Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates. △ Less

Submitted 13 September, 2020; v1 submitted 22 August, 2017; originally announced August 2017.

Comments: article, rejoinder and supplementary material

MSC Class: 62G08

Journal ref: Article: Annals of Statistics, Volume 48, Number 4, 1875-1897, 2020, Rejoinder: Annals of Statistics, Volume 48, Number 4, 1916-1921, 2020

arXiv:1704.01066 [pdf, other]

Tests for qualitative features in the random coefficients model

Authors: Fabian Dunker, Konstantin Eckle, Katharina Proksch, Johannes Schmidt-Hieber

Abstract: The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of d… ▽ More The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of dimensionality and the ill-posedness, pointwise nonparametric estimation of the joint density is difficult and suffers from slow convergence rates. Larger features, such as an increase of the density along some direction or a well-accentuated mode can, however, be much easier detected from data by means of statistical tests. In this article, we follow this strategy and construct tests and confidence statements for qualitative features of the joint density, such as increases, decreases and modes. We propose a multiple testing approach based on aggregating single tests which are designed to extract shape information on fixed scales and directions. Using recent tools for Gaussian approximations of multivariate empirical processes, we derive expressions for the critical value. We apply our method to simulated and real data. △ Less

Submitted 13 March, 2018; v1 submitted 4 April, 2017; originally announced April 2017.

MSC Class: 62G10; 62G15; 62G20

arXiv:1703.08358 [pdf, other]

Posterior contraction rates for support boundary recovery

Authors: Markus Reiss, Johannes Schmidt-Hieber

Abstract: Given a sample of a Poisson point process with intensity $λ_f(x,y) = n \mathbf{1}(f(x) \leq y),$ we study recovery of the boundary function $f$ from a nonparametric Bayes perspective. Because of the irregularity of this model, the analysis is non-standard. We establish a general result for the posterior contraction rate with respect to the $L^1$-norm based on entropy and one-sided small probabilit… ▽ More Given a sample of a Poisson point process with intensity $λ_f(x,y) = n \mathbf{1}(f(x) \leq y),$ we study recovery of the boundary function $f$ from a nonparametric Bayes perspective. Because of the irregularity of this model, the analysis is non-standard. We establish a general result for the posterior contraction rate with respect to the $L^1$-norm based on entropy and one-sided small probability bounds. From this, specific posterior contraction results are derived for Gaussian process priors and priors based on random wavelet series. △ Less

Submitted 12 June, 2020; v1 submitted 24 March, 2017; originally announced March 2017.

MSC Class: 62C10; 62G05; 60G55

arXiv:1608.01824 [pdf, ps, other]

doi 10.4171/MSL/1-2-1

The Le Cam distance between density estimation, Poisson processes and Gaussian white noise

Authors: Kolyan Ray, Johannes Schmidt-Hieber

Abstract: It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities have Hölder smoothness larger than $1/2$ and are uniformly bounded away from zero. We derive matching lower and constructive upper bounds for the Le Cam deficiencies between these experiments, with explicit dependence on both the sample size and th… ▽ More It is well-known that density estimation on the unit interval is asymptotically equivalent to a Gaussian white noise experiment, provided the densities have Hölder smoothness larger than $1/2$ and are uniformly bounded away from zero. We derive matching lower and constructive upper bounds for the Le Cam deficiencies between these experiments, with explicit dependence on both the sample size and the size of the densities in the parameter space. As a consequence, we derive sharp conditions on how small the densities can be for asymptotic equivalence to hold. The related case of Poisson intensity estimation is also treated. △ Less

Submitted 14 April, 2018; v1 submitted 5 August, 2016; originally announced August 2016.

Comments: Some results from an earlier version of this preprint have been moved to arXiv:1802.03425

MSC Class: 62G05 (Primary); 62G07; 62G20 (Secondary)

Journal ref: Math. Stat. Learn. 1 (2018), 101-170

arXiv:1512.00218 [pdf, ps, other]

doi 10.1088/0266-5611/32/6/065003

Minimax theory for a class of non-linear statistical inverse problems

Authors: Kolyan Ray, Johannes Schmidt-Hieber

Abstract: We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step i… ▽ More We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step is based on wavelet thresholding and is shown to be minimax optimal (up to logarithmic factors) in a pointwise function-dependent sense. Our analysis is based on a modified notion of Hölder smoothness scales that are natural in this setting. △ Less

Submitted 11 May, 2016; v1 submitted 1 December, 2015; originally announced December 2015.

Comments: 37 pages

MSC Class: 62G05 (Primary); 62G08; 62G20 (Secondary)

Journal ref: Inverse Problems 32 (2016) 065003

arXiv:1510.09054 [pdf, ps, other]

doi 10.1007/s10231-017-0655-2

A regularity class for the roots of non-negative functions

Authors: Kolyan Ray, Johannes Schmidt-Hieber

Abstract: We investigate the regularity of the positive roots of a non-negative function of one-variable. A modified Hölder space $\mathcal{F}^β$ is introduced such that if $f\in \mathcal{F}^β$ then $f^α\in C^{αβ}$. This provides sufficient conditions to overcome the usual limitation in the square root case ($α= 1/2$) for Hölder functions that $f^{1/2}$ need be no more than $C^1$ in general. We also derive… ▽ More We investigate the regularity of the positive roots of a non-negative function of one-variable. A modified Hölder space $\mathcal{F}^β$ is introduced such that if $f\in \mathcal{F}^β$ then $f^α\in C^{αβ}$. This provides sufficient conditions to overcome the usual limitation in the square root case ($α= 1/2$) for Hölder functions that $f^{1/2}$ need be no more than $C^1$ in general. We also derive bounds on the wavelet coefficients of $f^α$, which provide a finer understanding of its local regularity. △ Less

Submitted 16 March, 2017; v1 submitted 30 October, 2015; originally announced October 2015.

Comments: 12 pages

MSC Class: 26A16; 26A27

Journal ref: Ann. Mat. Pura Appl. 196 (2017), 2091-2103

arXiv:1510.02232 [pdf, other]

doi 10.1214/16-EJS1130

Conditions for Posterior Contraction in the Sparse Normal Means Problem

Authors: Stéphanie van der Pas, Jean-Bernard Salomond, Johannes Schmidt-Hieber

Abstract: The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors are less convenient from a computational point of view. In the meanwhile, a large number of continuous shrinkage priors has been proposed. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose… ▽ More The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors are less convenient from a computational point of view. In the meanwhile, a large number of continuous shrinkage priors has been proposed. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for the class of priors considered by Ghosh and Chakrabarti (2015), which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate. △ Less

Submitted 13 October, 2015; v1 submitted 8 October, 2015; originally announced October 2015.

Journal ref: Electron. J. Statist. 10 (2016), no. 1, 976--1000. http://projecteuclid.org/euclid.ejs/1460463652

arXiv:1403.0735 [pdf, ps, other]

doi 10.1214/15-AOS1334

Bayesian linear regression with sparse priors

Authors: Ismaël Castillo, Johannes Schmidt-Hieber, Aad van der Vaart

Abstract: We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It… ▽ More We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification. △ Less

Submitted 14 October, 2015; v1 submitted 4 March, 2014; originally announced March 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1334 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1334

Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 1986-2018

arXiv:1312.0416 [pdf, ps, other]

doi 10.1214/14-AOS1262

Asymptotic equivalence for regression under fractional noise

Authors: Johannes Schmidt-Hieber

Abstract: Consider estimation of the regression function based on a model with equidistant design and measurement errors generated from a fractional Gaussian noise process. In previous literature, this model has been heuristically linked to an experiment, where the anti-derivative of the regression function is continuously observed under additive perturbation by a fractional Brownian motion. Based on a refo… ▽ More Consider estimation of the regression function based on a model with equidistant design and measurement errors generated from a fractional Gaussian noise process. In previous literature, this model has been heuristically linked to an experiment, where the anti-derivative of the regression function is continuously observed under additive perturbation by a fractional Brownian motion. Based on a reformulation of the problem using reproducing kernel Hilbert spaces, we derive abstract approximation conditions on function spaces under which asymptotic equivalence between these models can be established and show that the conditions are satisfied for certain Sobolev balls exceeding some minimal smoothness. Furthermore, we construct a sequence space representation and provide necessary conditions for asymptotic equivalence to hold. △ Less

Submitted 1 December, 2014; v1 submitted 2 December, 2013; originally announced December 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1262 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1262

Journal ref: Annals of Statistics 2014, Vol. 42, No. 6, 2557-2585

arXiv:1309.6178 [pdf, ps, other]

Spot volatility estimation for high-frequency data: adaptive estimation in practice

Authors: Till Sabel, Johannes Schmidt-Hieber, Axel Munk

Abstract: We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise… ▽ More We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise model (e.g. jumps, rounding errors). These modifications are justified by simulations. The second part is devoted to investigate the behavior of volatility in response to macroeconomic events. We give evidence that the spot volatility of Euro-BUND futures is considerably higher during press conferences of the European Central Bank. As an outlook, we present an estimator for the spot covolatility of two different prices. △ Less

Submitted 24 September, 2013; originally announced September 2013.

MSC Class: 91B84; 62G08; 65T60; 62M99

arXiv:1305.5270 [pdf, ps, other]

doi 10.1214/15-AOS1341

On adaptive posterior concentration rates

Authors: Marc Hoffmann, Judith Rousseau, Johannes Schmidt-Hieber

Abstract: We investigate the problem of deriving posterior concentration rates under different loss functions in nonparametric Bayes. We first provide a lower bound on posterior coverages of shrinking neighbourhoods that relates the metric or loss under which the shrinking neighbourhood is considered, and an intrinsic pre-metric linked to frequentist separation rates. In the Gaussian white noise model, we c… ▽ More We investigate the problem of deriving posterior concentration rates under different loss functions in nonparametric Bayes. We first provide a lower bound on posterior coverages of shrinking neighbourhoods that relates the metric or loss under which the shrinking neighbourhood is considered, and an intrinsic pre-metric linked to frequentist separation rates. In the Gaussian white noise model, we construct feasible priors based on a spike and slab procedure reminiscent of wavelet thresholding that achieve adaptive rates of contraction under $L^2$ or $L^{\infty}$ metrics when the underlying parameter belongs to a collection of Hölder balls and that moreover achieve our lower bound. We analyse the consequences in terms of asymptotic behaviour of posterior credible balls as well as frequentist minimax adaptive estimation. Our results are appended with an upper bound for the contraction rate under an arbitrary loss in a generic regular experiment. The upper bound is attained for certain sieve priors and enables to extend our results to density estimation. △ Less

Submitted 5 November, 2015; v1 submitted 22 May, 2013; originally announced May 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1341 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1341

Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 2259-2295

arXiv:1303.3118 [pdf, ps, other]

On an estimator achieving the adaptive rate in nonparametric regression under $L^p$-loss for all $1\leq p \leq \infty$

Authors: Johannes Schmidt-Hieber

Abstract: Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a Hölder ball with smoothness index $β$ is $n^{-β/(2β+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-β/(2β+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other… ▽ More Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a Hölder ball with smoothness index $β$ is $n^{-β/(2β+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-β/(2β+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other way around. In this article, we construct an estimator that simultaneously achieves the optimal rates under $L^p$-risk for all $1\leq p\leq \infty$ without prior knowledge of $β.$ In contrast to classical wavelet thresholding methods that kill small empirical wavelet coefficients and keep large ones, it is essential for simultaneous adaptation that on each resolution level, the largest empirical wavelet coefficients are truncated. This leads to a completely different point of view on wavelet thresholding. The crucial part in the construction of the estimator is the size of the truncation level which is linked to the unknown smoothness index. Although estimation of the smoothness index is known to be a difficult task, there is a data-driven choice of the truncation level that is sufficiently precise for our purpose. △ Less

Submitted 7 February, 2015; v1 submitted 13 March, 2013; originally announced March 2013.

Comments: 21 pages

arXiv:1208.5501 [pdf, ps, other]

doi 10.3150/12-BEJ505

Asymptotically efficient estimation of a scale parameter in Gaussian time series and closed-form expressions for the Fisher information

Authors: Till Sabel, Johannes Schmidt-Hieber

Abstract: Mimicking the maximum likelihood estimator, we construct first order Cramer-Rao efficient and explicitly computable estimators for the scale parameter $σ^2$ in the model $Z_{i,n}=σn^{-β}X_i+Y_i,i=1,\ldots,n,β>0$ with independent, stationary Gaussian processes $(X_i)_{i\in\mathbb{N}}$, $(Y_i)_{i\in\mathbb{N}}$, and $(X_i)_{i\in\mathbb{N}}$ exhibits possibly long-range dependence. In a second part,… ▽ More Mimicking the maximum likelihood estimator, we construct first order Cramer-Rao efficient and explicitly computable estimators for the scale parameter $σ^2$ in the model $Z_{i,n}=σn^{-β}X_i+Y_i,i=1,\ldots,n,β>0$ with independent, stationary Gaussian processes $(X_i)_{i\in\mathbb{N}}$, $(Y_i)_{i\in\mathbb{N}}$, and $(X_i)_{i\in\mathbb{N}}$ exhibits possibly long-range dependence. In a second part, closed-form expressions for the asymptotic behavior of the corresponding Fisher information are derived. Our main finding is that depending on the behavior of the spectral densities at zero, the Fisher information has asymptotically two different scaling regimes, which are separated by a sharp phase transition. The most prominent example included in our analysis is the Fisher information for the scaling factor of a high-frequency sample of fractional Brownian motion under additive noise. △ Less

Submitted 13 March, 2014; v1 submitted 27 August, 2012; originally announced August 2012.

Comments: Published in at http://dx.doi.org/10.3150/12-BEJ505 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ505

Journal ref: Bernoulli 2014, Vol. 20, No. 2, 747-774

arXiv:1107.1404 [pdf, other]

Multiscale Methods for Shape Constraints in Deconvolution: Confidence Statements for Qualitative Features

Authors: Johannes Schmidt-Hieber, Axel Munk, Lutz Duembgen

Abstract: We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing… ▽ More We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow. △ Less

Submitted 17 December, 2012; v1 submitted 7 July, 2011; originally announced July 2011.

Comments: 55 pages, 5 figures, This is a revised version of a previous paper with the title: "Multiscale Methods for Shape Constraints in Deconvolution"

MSC Class: 62G10 (Primary) 62G15; 62G20 (Secondary)

arXiv:1007.4622 [pdf, ps, other]

Adaptive wavelet estimation of the diffusion coefficient under additive error measurements

Authors: Marc Hoffmann, Axel Munk, Johannes Schmidt-Hieber

Abstract: We study nonparametric estimation of the diffusion coefficient from discrete data, when the observations are blurred by additional noise. Such issues have been developed over the last 10 years in several application fields and in particular in high frequency financial data modelling, however mainly from a parametric and semiparametric point of view. This paper addresses the nonparametric estimatio… ▽ More We study nonparametric estimation of the diffusion coefficient from discrete data, when the observations are blurred by additional noise. Such issues have been developed over the last 10 years in several application fields and in particular in high frequency financial data modelling, however mainly from a parametric and semiparametric point of view. This paper addresses the nonparametric estimation of the path of the (possibly stochastic) diffusion coefficient in a relatively general setting. By develo** pre-averaging techniques combined with wavelet thresholding, we construct adaptive estimators that achieve a nearly optimal rate within a large scale of smoothness constraints of Besov type. Since the diffusion coefficient is usually genuinely random, we propose a new criterion to assess the quality of estimation; we retrieve the usual minimax theory when this approach is restricted to a deterministic diffusion coefficient. In particular, we take advantage of recent results of Reiss [33] of asymptotic equivalence between a Gaussian diffusion with additive noise and Gaussian white noise model, in order to prove a sharp lower bound. △ Less

Submitted 29 December, 2011; v1 submitted 27 July, 2010; originally announced July 2010.

Comments: 46 pages. This is the second version. A first draft of the paper appeared as a working paper in 2010 under the title "Nonparametric estimation of the volatility under microstructure noise: wavelet adaptation"

MSC Class: 62G99; 62M99; 60G99

arXiv:1002.3045 [pdf, ps, other]

Lower bounds for volatility estimation in microstructure noise models

Authors: Axel Munk, Johannes Schmidt-Hieber

Abstract: In this paper we derive lower bounds in minimax sense for estimation of the instantaneous volatility if the diffusion type part cannot be observed directly but under some additional Gaussian noise. Three different models are considered. Our technique is based on a general inequality for Kullback-Leibler divergence of multivariate normal random variables and spectral analysis of the processes. Th… ▽ More In this paper we derive lower bounds in minimax sense for estimation of the instantaneous volatility if the diffusion type part cannot be observed directly but under some additional Gaussian noise. Three different models are considered. Our technique is based on a general inequality for Kullback-Leibler divergence of multivariate normal random variables and spectral analysis of the processes. The derived lower bounds are indeed optimal. Upper bounds can be found in Munk and Schmidt-Hieber [18]. Our major finding is that the Gaussian microstructure noise introduces an additional degree of ill-posedness for each model, respectively. △ Less

Submitted 16 February, 2010; originally announced February 2010.

Comments: 16 pages

arXiv:0908.3163 [pdf, other]

Nonparametric estimation of the volatility function in a high-frequency model corrupted by noise

Authors: Axel Munk, Johannes Schmidt-Hieber

Abstract: We consider the models Y_{i,n}=\int_0^{i/n} σ(s)dW_s+τ(i/n)ε_{i,n}, and \tilde Y_{i,n}=σ(i/n)W_{i/n}+τ(i/n)ε_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and ε_{i,n} are centered i.i.d. random variables with E(ε_{i,n}^2)=1 and finite fourth moment. Furthermore, σand τare unknown deterministic functions and W_t and (ε_{1,n},...,ε_{n,n}) are assumed to be independent processes. Bas… ▽ More We consider the models Y_{i,n}=\int_0^{i/n} σ(s)dW_s+τ(i/n)ε_{i,n}, and \tilde Y_{i,n}=σ(i/n)W_{i/n}+τ(i/n)ε_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and ε_{i,n} are centered i.i.d. random variables with E(ε_{i,n}^2)=1 and finite fourth moment. Furthermore, σand τare unknown deterministic functions and W_t and (ε_{1,n},...,ε_{n,n}) are assumed to be independent processes. Based on a spectral decomposition of the covariance structures we derive series estimators for σ^2 and τ^2 and investigate their rate of convergence of the MISE in dependence of their smoothness. To this end specific basis functions and their corresponding Sobolev ellipsoids are introduced and we show that our estimators are optimal in minimax sense. Our work is motivated by microstructure noise models. Our major finding is that the microstructure noise ε_{i,n} introduces an additionally degree of ill-posedness of 1/2; irrespectively of the tail behavior of ε_{i,n}. The method is illustrated by a small numerical study. △ Less

Submitted 6 April, 2010; v1 submitted 21 August, 2009; originally announced August 2009.

Comments: 5 figures, corrected references, minor changes

Showing 1–39 of 39 results for author: Schmidt-Hieber, J