Skip to main content

Showing 1–31 of 31 results for author: Steinwart, I

Searching in archive stat. Search in all archives.
.
  1. arXiv:2305.14077  [pdf, other

    stat.ML cs.LG math.ST

    Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

    Authors: Moritz Haas, David Holzmüller, Ulrike von Luxburg, Ingo Steinwart

    Abstract: The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typica… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: We provide Python code to reproduce all of our experimental results at https://github.com/moritzhaas/mind-the-spikes

  2. arXiv:2212.12474  [pdf, other

    cs.LG math.NA stat.ML

    Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers

    Authors: Marvin Pförtner, Ingo Steinwart, Philipp Hennig, Jonathan Wenger

    Abstract: Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for i… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 December, 2022; originally announced December 2022.

  3. arXiv:2206.11517  [pdf, other

    cs.LG cs.AI stat.ML

    Utilizing Expert Features for Contrastive Learning of Time-Series Representations

    Authors: Manuel Nonnenmacher, Lukas Oldenburg, Ingo Steinwart, David Reeb

    Abstract: We present an approach that incorporates expert knowledge for time-series representation learning. Our method employs expert features to replace the commonly used data transformations in previous contrastive learning approaches. We do this since time-series data frequently stems from the industrial or medical field where expert features are often available from domain experts, while transformation… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML), PMLR 162:16969-16989, 2022

  4. arXiv:2203.09410  [pdf, other

    stat.ML cs.LG cs.NE

    A Framework and Benchmark for Deep Batch Active Learning for Regression

    Authors: David Holzmüller, Viktor Zaverkin, Johannes Kästner, Ingo Steinwart

    Abstract: The acquisition of labels for supervised learning can be expensive. To improve the sample efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods. Our framework encompasses many e… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Published at the Journal of Machine Learning Research (JMLR). Changes in v4: Improvements in writing and other minor changes. Accompanying code can be found at https://github.com/dholzmueller/bmdal_reg

    Journal ref: Journal of Machine Learning Research, 24(164):1-81, 2023

  5. arXiv:2110.11395  [pdf, other

    cs.LG cs.CV stat.ML

    SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

    Authors: Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, David Reeb

    Abstract: Pruning neural networks reduces inference time and memory costs. On standard hardware, these benefits will be especially prominent if coarse-grained structures, like feature maps, are pruned. We devise two novel saliency-based methods for second-order structured pruning (SOSP) which include correlations among all structures and layers. Our main method SOSP-H employs an innovative second-order appr… ▽ More

    Submitted 30 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2022

  6. arXiv:2109.09569  [pdf, other

    physics.comp-ph stat.ML

    Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments

    Authors: Viktor Zaverkin, David Holzmüller, Ingo Steinwart, Johannes Kästner

    Abstract: Artificial neural networks (NNs) are one of the most frequently used machine learning approaches to construct interatomic potentials and enable efficient large-scale atomistic simulations with almost ab initio accuracy. However, the simultaneous training of NNs on energies and forces, which are a prerequisite for, e.g., molecular dynamics simulations, can be demanding. In this work, we present an… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: Manuscript accepted for publication in J. Chem. Theory Comput.; Code published at https://gitlab.com/zaverkin_v/gmnn

  7. Which Minimizer Does My Neural Network Converge To?

    Authors: Manuel Nonnenmacher, David Reeb, Ingo Steinwart

    Abstract: The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy… ▽ More

    Submitted 30 June, 2022; v1 submitted 4 November, 2020; originally announced November 2020.

    Journal ref: ECML PKDD 2021. Machine Learning and Knowledge Discovery in Databases. Research Track

  8. arXiv:2002.04861  [pdf, other

    stat.ML cs.LG

    Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

    Authors: David Holzmüller, Ingo Steinwart

    Abstract: We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a large class of one-dimensional data-generating distributions for which, with high probability, gradient descent only finds a bad local minimum of the optimization l… ▽ More

    Submitted 8 June, 2022; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: To appear in Journal of Machine Learning Research (JMLR). Changes in v3: Added new Section 10 with extensive experimental evaluation. Code available at https://github.com/dholzmueller/nn_inconsistency

  9. arXiv:1905.11028  [pdf, other

    stat.ML cs.LG

    Best-scored Random Forest Classification

    Authors: Hanyuan Hang, Xiaoyu Liu, Ingo Steinwart

    Abstract: We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspectiv… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  10. arXiv:1905.10686  [pdf, other

    stat.ML cs.LG

    Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

    Authors: Nicole Mücke, Ingo Steinwart

    Abstract: A common strategy to train deep neural networks (DNNs) is to use very large architectures and to train them until they (almost) achieve zero training error. Empirically observed good generalization performance on test data, even in the presence of lots of label noise, corroborate such a procedure. On the other hand, in statistical learning theory it is known that over-fitting models may lead to po… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 May, 2019; originally announced May 2019.

  11. arXiv:1903.11482  [pdf, other

    cs.LG stat.ML

    A Sober Look at Neural Network Initializations

    Authors: Ingo Steinwart

    Abstract: Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In this paper we discuss some consequences of commonly used initialization strategies for vanilla DNNs with ReLU activations. Based on these insights we then develop an alt… ▽ More

    Submitted 4 September, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

  12. arXiv:1810.02321  [pdf, ps, other

    stat.ML cs.LG

    Optimal Learning with Anisotropic Gaussian SVMs

    Authors: Hanyuan Hang, Ingo Steinwart

    Abstract: This paper investigates the nonparametric regression problem using SVMs with anisotropic Gaussian RBF kernels. Under the assumption that the target functions are resided in certain anisotropic Besov spaces, we establish the almost optimal learning rates, more precisely, optimal up to some logarithmic factor, presented by the effective smoothness. By taking the effective smoothness into considerati… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

  13. arXiv:1712.05279  [pdf, ps, other

    math.FA math.ST stat.ML

    Strictly proper kernel scores and characteristic kernels on compact spaces

    Authors: Ingo Steinwart, Johanna F. Ziegel

    Abstract: Strictly proper kernel scores are well-known tool in probabilistic forecasting, while characteristic kernels have been extensively investigated in the machine learning literature. We first show that both notions coincide, so that insights from one part of the literature can be used in the other. We then show that the metric induced by a characteristic kernel cannot reliably distinguish between dis… ▽ More

    Submitted 14 December, 2017; originally announced December 2017.

  14. arXiv:1708.05254  [pdf, other

    stat.ML stat.ME

    Adaptive Clustering Using Kernel Density Estimators

    Authors: Ingo Steinwart, Bharath K. Sriperumbudur, Philipp Thomann

    Abstract: We derive and analyze a generic, recursive algorithm for estimating all splits in a finite cluster tree as well as the corresponding clusters. We further investigate statistical properties of this generic clustering algorithm when it receives level set estimates from a kernel density estimator. In particular, we derive finite sample guarantees, consistency, rates of convergence, and an adaptive da… ▽ More

    Submitted 1 November, 2021; v1 submitted 17 August, 2017; originally announced August 2017.

  15. arXiv:1702.07552  [pdf, ps, other

    stat.ML cs.LG

    Learning Rates for Kernel-Based Expectile Regression

    Authors: Muhammad Farooq, Ingo Steinwart

    Abstract: Conditional expectiles are becoming an increasingly important tool in finance as well as in other areas of applications. We analyse a support vector machine type approach for estimating conditional expectiles and establish learning rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF kernels are used and the desired expectile is smooth in a Besov sense. As a special case, our… ▽ More

    Submitted 27 February, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

  16. arXiv:1702.07254  [pdf, ps, other

    stat.ML

    Sobolev Norm Learning Rates for Regularized Least-Squares Algorithm

    Authors: Simon Fischer, Ingo Steinwart

    Abstract: Learning rates for least-squares regression are typically expressed in terms of $L_2$-norms. In this paper we extend these rates to norms stronger than the $L_2$-norm without requiring the regression function to be contained in the hypothesis space. In the special case of Sobolev reproducing kernel Hilbert spaces used as hypotheses spaces, these stronger norms coincide with fractional Sobolev norm… ▽ More

    Submitted 8 October, 2020; v1 submitted 23 February, 2017; originally announced February 2017.

    Comments: accepted manuscript in J. Mach. Learn. Res

    Journal ref: J. Mach. Learn. Res. 21 (2020) 1-38

  17. arXiv:1702.06899  [pdf, ps, other

    stat.ML cs.LG

    liquidSVM: A Fast and Versatile SVM package

    Authors: Ingo Steinwart, Philipp Thomann

    Abstract: liquidSVM is a package written in C++ that provides SVM-type solvers for various classification and regression tasks. Because of a fully integrated hyper-parameter selection, very carefully implemented solvers, multi-threading and GPU support, and several built-in data decomposition strategies it provides unprecedented speed for small training sizes as well as for data sets of tens of millions of… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

  18. arXiv:1612.00824  [pdf, other

    stat.ML cs.LG

    Learning with Hierarchical Gaussian Kernels

    Authors: Ingo Steinwart, Philipp Thomann, Nico Schmid

    Abstract: We investigate iterated compositions of weighted sums of Gaussian kernels and provide an interpretation of the construction that shows some similarities with the architectures of deep neural networks. On the theoretical side, we show that these kernels are universal and that SVMs using these kernels are universally consistent. We further describe a parameter optimization method for the kernel para… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

  19. arXiv:1612.00374  [pdf, other

    stat.ML cs.LG

    Spatial Decompositions for Large Scale SVMs

    Authors: Philipp Thomann, Ingrid Blaschzyk, Mona Meister, Ingo Steinwart

    Abstract: Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive, if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially de… ▽ More

    Submitted 8 February, 2018; v1 submitted 1 December, 2016; originally announced December 2016.

    Journal ref: Proceedings of Machine Learning Research Volume 54: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics 2017 (A. Singh and J. Zhu, eds.), pp. 1329-1337, 2017

  20. arXiv:1607.03792  [pdf, ps, other

    stat.ML

    Kernel Density Estimation for Dynamical Systems

    Authors: Hanyuan Hang, Ingo Steinwart, Yunlong Feng, Johan A. K. Suykens

    Abstract: We study the density estimation problem with observations generated by certain dynamical systems that admit a unique underlying invariant Lebesgue density. Observations drawn from dynamical systems are not independent and moreover, usual mixing concepts may not be appropriate for measuring the dependence among these observations. By employing the $\mathcal{C}$-mixing concept to measure the depende… ▽ More

    Submitted 13 July, 2016; originally announced July 2016.

  21. arXiv:1605.02887  [pdf, ps, other

    stat.ML cs.LG

    Learning theory estimates with observations from general stationary stochastic processes

    Authors: Hanyuan Hang, Yunlong Feng, Ingo Steinwart, Johan A. K. Suykens

    Abstract: This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: text overlap with arXiv:1501.03059

  22. arXiv:1508.05249  [pdf, ps, other

    math.OC math.FA math.ST stat.ML

    Representation of Quasi-Monotone Functionals by Families of Separating Hyperplanes

    Authors: Ingo Steinwart

    Abstract: We characterize when the level sets of a continuous quasi-monotone functional defined on a suitable convex subset of a normed space can be uniquely represented by a family of bounded continuous functionals. Furthermore, we investigate how regularly these functionals depend on the parameterizing level. Finally, we show how this question relates to the recent problem of property elicitation that sim… ▽ More

    Submitted 21 August, 2015; originally announced August 2015.

    Comments: 23 pages

  23. arXiv:1508.03712  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Towards an Axiomatic Approach to Hierarchical Clustering of Measures

    Authors: Philipp Thomann, Ingo Steinwart, Nico Schmid

    Abstract: We propose some axioms for hierarchical clustering of probability measures and investigate their ramifications. The basic idea is to let the user stipulate the clusters for some elementary measures. This is done without the need of any notion of metric, similarity or dissimilarity. Our main results then show that for each suitable choice of user-defined clustering on elementary measures we obtain… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

    MSC Class: Primary 62H30; Secondary 91C20; 62G07

    Journal ref: Journal of Machine Learning Research. 16(Sep):1949-2002, 2015

  24. arXiv:1507.06615  [pdf, ps, other

    stat.ML

    Optimal Learning Rates for Localized SVMs

    Authors: Mona Eberts, Ingo Steinwart

    Abstract: One of the limiting factors of using support vector machines (SVMs) in large scale applications are their super-linear computational requirements in terms of the number of training samples. To address this issue, several approaches that train SVMs on many small chunks of large data sets separately have been proposed in the literature. So far, however, almost all these approaches have only been emp… ▽ More

    Submitted 23 July, 2015; originally announced July 2015.

    Comments: 68 pages, 20 figures, and 11 tables

  25. arXiv:1507.03887  [pdf, other

    stat.CO stat.ML

    An SVM-like Approach for Expectile Regression

    Authors: Muhammad Farooq, Ingo Steinwart

    Abstract: Expectile regression is a nice tool for investigating conditional distributions beyond the conditional mean. It is well-known that expectiles can be described with the help of the asymmetric least square loss function, and this link makes it possible to estimate expectiles in a non-parametric framework by a support vector machine like approach. In this work we develop an efficient sequential-minim… ▽ More

    Submitted 14 July, 2015; originally announced July 2015.

  26. arXiv:1409.8437  [pdf, ps, other

    stat.ME stat.ML

    Fully adaptive density-based clustering

    Authors: Ingo Steinwart

    Abstract: The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an almost arbitrary level set estimator to estimate the smallest level at which there are more than one connected components. In the case where this algorithm is… ▽ More

    Submitted 28 October, 2015; v1 submitted 30 September, 2014; originally announced September 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOS1331 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1331

    Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 2132-2167

  27. arXiv:1302.6427  [pdf, other

    stat.ME math.PR

    Hypothesis Testing for Validation and Certification

    Authors: Clint Scovel, Ingo Steinwart

    Abstract: We develop a hypothesis testing framework for the formulation of the problems of 1) the validation of a simulation model and 2) using modeling to certify the performance of a physical system. These results are used to solve the extrapolative validation and certification problems, namely problems where the regime of interest is different than the regime for which we have experimental data. We use c… ▽ More

    Submitted 26 February, 2013; originally announced February 2013.

    Report number: LA-UR-10-02355 MSC Class: 60

  28. arXiv:1205.3845  [pdf, other

    stat.ME stat.CO stat.OT

    Forecasting with Historical Data or Process Knowledge under Misspecification: A Comparison

    Authors: Luke Bornn, Marian Anghel, Ingo Steinwart

    Abstract: When faced with the task of forecasting a dynamic system, practitioners often have available historical data, knowledge of the system, or a combination of both. While intuition dictates that perfect knowledge of the system should in theory yield perfect forecasting, often knowledge of the system is only partially known, known up to parameters, or known incorrectly. In contrast, forecasting using p… ▽ More

    Submitted 17 May, 2012; originally announced May 2012.

  29. Fast rates for support vector machines using Gaussian kernels

    Authors: Ingo Steinwart, Clint Scovel

    Abstract: For binary classification we establish learning rates up to the order of $n^{-1}$ for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov's noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike prev… ▽ More

    Submitted 14 August, 2007; originally announced August 2007.

    Comments: Published at http://dx.doi.org/10.1214/009053606000001226 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS0237 MSC Class: 68Q32 (Primary); 62G20; 62G99; 68T05; 68T10; 41A46; 41A99 (Secondary)

    Journal ref: Annals of Statistics 2007, Vol. 35, No. 2, 575-607

  30. arXiv:0707.0322  [pdf, ps, other

    stat.ME math.DS math.ST

    Consistency of support vector machines for forecasting the evolution of an unknown ergodic dynamical system from observations with unknown noise

    Authors: Ingo Steinwart, Marian Anghel

    Abstract: We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of noisy observations if (a) the unknown observational noise process is bounded and has a summable… ▽ More

    Submitted 7 April, 2009; v1 submitted 2 July, 2007; originally announced July 2007.

    Comments: Published in at http://dx.doi.org/10.1214/07-AOS562 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS562 MSC Class: 62M20 (Primary) 37D25; 37C99; 37M10; 60K99; 62M10; 62M45; 68Q32; 68T05 (Secondary)

    Journal ref: Annals of Statistics 2009, Vol. 37, No. 2, 841-875

  31. arXiv:0707.0303  [pdf, ps, other

    stat.ML stat.ME

    Learning from dependent observations

    Authors: Ingo Steinwart, Don Hush, Clint Scovel

    Abstract: In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of S… ▽ More

    Submitted 2 July, 2007; originally announced July 2007.

    Comments: submitted to Journal of Multivariate Analysis

    Report number: Los Alamos National Laboratory Technical Report LA-UR-06-3507