Skip to main content

Showing 1–20 of 20 results for author: Steinwart, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04491  [pdf, other

    cs.LG

    Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data

    Authors: David Holzmüller, Léo Grinsztajn, Ingo Steinwart

    Abstract: For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) improved default parameters for GBDTs and RealMLP. We tune RealMLP and the de… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 10 pages + 44 pages appendix. Code is available at github.com/dholzmueller/pytabkit and github.com/LeoGrin/tabular-benchmark/tree/better_by_default

  2. arXiv:2404.03453  [pdf, ps, other

    math.PR cs.LG math.ST

    Conditioning of Banach Space Valued Gaussian Random Variables: An Approximation Approach Based on Martingales

    Authors: Ingo Steinwart

    Abstract: In this paper we investigate the conditional distributions of two Banach space valued, jointly Gaussian random variables. These conditional distributions are again Gaussian and their means and covariances are determined by a general approximation scheme based upon a martingale idea. We then apply our general results to the case of Gaussian processes with continuous paths conditioned to partial obs… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 50 pages plus 22 pages of supplemental material

  3. arXiv:2305.14077  [pdf, other

    stat.ML cs.LG math.ST

    Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

    Authors: Moritz Haas, David Holzmüller, Ulrike von Luxburg, Ingo Steinwart

    Abstract: The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typica… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: We provide Python code to reproduce all of our experimental results at https://github.com/moritzhaas/mind-the-spikes

  4. arXiv:2212.12474  [pdf, other

    cs.LG math.NA stat.ML

    Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers

    Authors: Marvin Pförtner, Ingo Steinwart, Philipp Hennig, Jonathan Wenger

    Abstract: Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for i… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 December, 2022; originally announced December 2022.

  5. arXiv:2206.11517  [pdf, other

    cs.LG cs.AI stat.ML

    Utilizing Expert Features for Contrastive Learning of Time-Series Representations

    Authors: Manuel Nonnenmacher, Lukas Oldenburg, Ingo Steinwart, David Reeb

    Abstract: We present an approach that incorporates expert knowledge for time-series representation learning. Our method employs expert features to replace the commonly used data transformations in previous contrastive learning approaches. We do this since time-series data frequently stems from the industrial or medical field where expert features are often available from domain experts, while transformation… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML), PMLR 162:16969-16989, 2022

  6. arXiv:2203.09410  [pdf, other

    stat.ML cs.LG cs.NE

    A Framework and Benchmark for Deep Batch Active Learning for Regression

    Authors: David Holzmüller, Viktor Zaverkin, Johannes Kästner, Ingo Steinwart

    Abstract: The acquisition of labels for supervised learning can be expensive. To improve the sample efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods. Our framework encompasses many e… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Published at the Journal of Machine Learning Research (JMLR). Changes in v4: Improvements in writing and other minor changes. Accompanying code can be found at https://github.com/dholzmueller/bmdal_reg

    Journal ref: Journal of Machine Learning Research, 24(164):1-81, 2023

  7. arXiv:2110.11395  [pdf, other

    cs.LG cs.CV stat.ML

    SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

    Authors: Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, David Reeb

    Abstract: Pruning neural networks reduces inference time and memory costs. On standard hardware, these benefits will be especially prominent if coarse-grained structures, like feature maps, are pruned. We devise two novel saliency-based methods for second-order structured pruning (SOSP) which include correlations among all structures and layers. Our main method SOSP-H employs an innovative second-order appr… ▽ More

    Submitted 30 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2022

  8. Which Minimizer Does My Neural Network Converge To?

    Authors: Manuel Nonnenmacher, David Reeb, Ingo Steinwart

    Abstract: The loss surface of an overparameterized neural network (NN) possesses many global minima of zero training error. We explain how common variants of the standard NN training procedure change the minimizer obtained. First, we make explicit how the size of the initialization of a strongly overparameterized NN affects the minimizer and can deteriorate its final test performance. We propose a strategy… ▽ More

    Submitted 30 June, 2022; v1 submitted 4 November, 2020; originally announced November 2020.

    Journal ref: ECML PKDD 2021. Machine Learning and Knowledge Discovery in Databases. Research Track

  9. arXiv:2002.04861  [pdf, other

    stat.ML cs.LG

    Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

    Authors: David Holzmüller, Ingo Steinwart

    Abstract: We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a large class of one-dimensional data-generating distributions for which, with high probability, gradient descent only finds a bad local minimum of the optimization l… ▽ More

    Submitted 8 June, 2022; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: To appear in Journal of Machine Learning Research (JMLR). Changes in v3: Added new Section 10 with extensive experimental evaluation. Code available at https://github.com/dholzmueller/nn_inconsistency

  10. arXiv:2002.03171  [pdf, ps, other

    math.FA cs.LG

    Reproducing Kernel Hilbert Spaces Cannot Contain all Continuous Functions on a Compact Metric Space

    Authors: Ingo Steinwart

    Abstract: Given an uncountable, compact metric space, we show that there exists no reproducing kernel Hilbert space that contains the space of all continuous functions on this compact space.

    Submitted 13 March, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: 2 pages

  11. arXiv:1905.11028  [pdf, other

    stat.ML cs.LG

    Best-scored Random Forest Classification

    Authors: Hanyuan Hang, Xiaoyu Liu, Ingo Steinwart

    Abstract: We propose an algorithm named best-scored random forest for binary classification problems. The terminology "best-scored" means to select the one with the best empirical performance out of a certain number of purely random tree candidates as each single tree in the forest. In this way, the resulting forest can be more accurate than the original purely random forest. From the theoretical perspectiv… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  12. arXiv:1905.10686  [pdf, other

    stat.ML cs.LG

    Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning

    Authors: Nicole Mücke, Ingo Steinwart

    Abstract: A common strategy to train deep neural networks (DNNs) is to use very large architectures and to train them until they (almost) achieve zero training error. Empirically observed good generalization performance on test data, even in the presence of lots of label noise, corroborate such a procedure. On the other hand, in statistical learning theory it is known that over-fitting models may lead to po… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 May, 2019; originally announced May 2019.

  13. arXiv:1903.11482  [pdf, other

    cs.LG stat.ML

    A Sober Look at Neural Network Initializations

    Authors: Ingo Steinwart

    Abstract: Initializing the weights and the biases is a key part of the training process of a neural network. Unlike the subsequent optimization phase, however, the initialization phase has gained only limited attention in the literature. In this paper we discuss some consequences of commonly used initialization strategies for vanilla DNNs with ReLU activations. Based on these insights we then develop an alt… ▽ More

    Submitted 4 September, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

  14. arXiv:1810.02321  [pdf, ps, other

    stat.ML cs.LG

    Optimal Learning with Anisotropic Gaussian SVMs

    Authors: Hanyuan Hang, Ingo Steinwart

    Abstract: This paper investigates the nonparametric regression problem using SVMs with anisotropic Gaussian RBF kernels. Under the assumption that the target functions are resided in certain anisotropic Besov spaces, we establish the almost optimal learning rates, more precisely, optimal up to some logarithmic factor, presented by the effective smoothness. By taking the effective smoothness into considerati… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

  15. arXiv:1702.07552  [pdf, ps, other

    stat.ML cs.LG

    Learning Rates for Kernel-Based Expectile Regression

    Authors: Muhammad Farooq, Ingo Steinwart

    Abstract: Conditional expectiles are becoming an increasingly important tool in finance as well as in other areas of applications. We analyse a support vector machine type approach for estimating conditional expectiles and establish learning rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF kernels are used and the desired expectile is smooth in a Besov sense. As a special case, our… ▽ More

    Submitted 27 February, 2017; v1 submitted 24 February, 2017; originally announced February 2017.

  16. arXiv:1702.06899  [pdf, ps, other

    stat.ML cs.LG

    liquidSVM: A Fast and Versatile SVM package

    Authors: Ingo Steinwart, Philipp Thomann

    Abstract: liquidSVM is a package written in C++ that provides SVM-type solvers for various classification and regression tasks. Because of a fully integrated hyper-parameter selection, very carefully implemented solvers, multi-threading and GPU support, and several built-in data decomposition strategies it provides unprecedented speed for small training sizes as well as for data sets of tens of millions of… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

  17. arXiv:1612.00824  [pdf, other

    stat.ML cs.LG

    Learning with Hierarchical Gaussian Kernels

    Authors: Ingo Steinwart, Philipp Thomann, Nico Schmid

    Abstract: We investigate iterated compositions of weighted sums of Gaussian kernels and provide an interpretation of the construction that shows some similarities with the architectures of deep neural networks. On the theoretical side, we show that these kernels are universal and that SVMs using these kernels are universally consistent. We further describe a parameter optimization method for the kernel para… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

  18. arXiv:1612.00374  [pdf, other

    stat.ML cs.LG

    Spatial Decompositions for Large Scale SVMs

    Authors: Philipp Thomann, Ingrid Blaschzyk, Mona Meister, Ingo Steinwart

    Abstract: Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive, if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially de… ▽ More

    Submitted 8 February, 2018; v1 submitted 1 December, 2016; originally announced December 2016.

    Journal ref: Proceedings of Machine Learning Research Volume 54: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics 2017 (A. Singh and J. Zhu, eds.), pp. 1329-1337, 2017

  19. arXiv:1605.02887  [pdf, ps, other

    stat.ML cs.LG

    Learning theory estimates with observations from general stationary stochastic processes

    Authors: Hanyuan Hang, Yunlong Feng, Ingo Steinwart, Johan A. K. Suykens

    Abstract: This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: text overlap with arXiv:1501.03059

  20. arXiv:1508.03712  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Towards an Axiomatic Approach to Hierarchical Clustering of Measures

    Authors: Philipp Thomann, Ingo Steinwart, Nico Schmid

    Abstract: We propose some axioms for hierarchical clustering of probability measures and investigate their ramifications. The basic idea is to let the user stipulate the clusters for some elementary measures. This is done without the need of any notion of metric, similarity or dissimilarity. Our main results then show that for each suitable choice of user-defined clustering on elementary measures we obtain… ▽ More

    Submitted 15 August, 2015; originally announced August 2015.

    MSC Class: Primary 62H30; Secondary 91C20; 62G07

    Journal ref: Journal of Machine Learning Research. 16(Sep):1949-2002, 2015