Search | arXiv e-print repository

Peering inside the black box: Learning the relevance of many-body functions in Neural Network potentials

Authors: Klara Bonneau, Jonas Lederer, Clark Templeton, David Rosenberger, Klaus-Robert Müller, Cecilia Clementi

Abstract: Machine learned potentials are becoming a popular tool to define an effective energy model for complex systems, either incorporating electronic structure effects at the atomistic resolution, or effectively renormalizing part of the atomistic degrees of freedom at a coarse-grained resolution. One of the main criticisms to machine learned potentials is that the energy inferred by the network is not… ▽ More Machine learned potentials are becoming a popular tool to define an effective energy model for complex systems, either incorporating electronic structure effects at the atomistic resolution, or effectively renormalizing part of the atomistic degrees of freedom at a coarse-grained resolution. One of the main criticisms to machine learned potentials is that the energy inferred by the network is not as interpretable as in more traditional approaches where a simpler functional form is used. Here we address this problem by extending tools recently proposed in the nascent field of Explainable Artificial Intelligence (XAI) to coarse-grained potentials based on graph neural networks (GNN). We demonstrate the approach on three different coarse-grained systems including two fluids (methane and water) and the protein NTL9. On these examples, we show that the neural network potentials can be in practice decomposed in relevance contributions to different orders, that can be directly interpreted and provide physical insights on the systems of interest. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2405.16696 [pdf, other]

How many samples are needed to train a deep neural network?

Authors: Pegah Golestaneh, Mahsa Taheri, Johannes Lederer

Abstract: Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the… ▽ More Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.14529 [pdf, other]

AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2

Authors: Simon Damm, Mike Laszkiewicz, Johannes Lederer, Asja Fischer

Abstract: Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only… ▽ More Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only rival existing techniques but can even outmatch them in many settings. Our proposed vision-only approach, AnomalyDINO, is based on patch similarities and enables both image-level anomaly prediction and pixel-level anomaly segmentation. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%). The reduced overhead, coupled with its outstanding few-shot performance, makes AnomalyDINO a strong candidate for fast deployment, for example, in industrial contexts. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2401.13555 [pdf, other]

doi 10.1145/3630106.3658921

Benchmarking the Fairness of Image Upsampling Methods

Authors: Mike Laszkiewicz, Imant Daunhawer, Julia E. Vogt, Asja Fischer, Johannes Lederer

Abstract: Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional gener… ▽ More Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$\unicode{x2013}$inspired by their supervised fairness counterparts$\unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results. All experiments can be reproduced using our provided repository. △ Less

Submitted 29 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24)

arXiv:2311.09245 [pdf, other]

Affine Invariance in Continuous-Domain Convolutional Neural Networks

Authors: Ali Mohaddes, Johannes Lederer

Abstract: The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considerin… ▽ More The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the generalized linear group $\mathrm{GL}_2(\mathbb{R})$. We introduce a new criterion to assess the similarity of two input signals under affine transformations. Then, unlike conventional methods that involve solving complex optimization problems on the Lie group $G_2$, we analyze the convolution of lifted signals and compute the corresponding integration over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that practical deep-learning pipelines can handle. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2307.15067 [pdf, ps, other]

Set-Membership Inference Attacks using Data Watermarking

Authors: Mike Laszkiewicz, Denis Lukovnikov, Johannes Lederer, Asja Fischer

Abstract: In this work, we propose a set-membership inference attack for generative models using deep image watermarking techniques. In particular, we demonstrate how conditional sampling from a generative model can reveal the watermark that was injected into parts of the training data. Our empirical results demonstrate that the proposed watermarking technique is a principled approach for detecting the non-… ▽ More In this work, we propose a set-membership inference attack for generative models using deep image watermarking techniques. In particular, we demonstrate how conditional sampling from a generative model can reveal the watermark that was injected into parts of the training data. Our empirical results demonstrate that the proposed watermarking technique is a principled approach for detecting the non-consensual use of image data in training generative models. △ Less

Submitted 22 June, 2023; originally announced July 2023.

Comments: Preliminary work

arXiv:2306.06210 [pdf, other]

Single-Model Attribution of Generative Models Through Final-Layer Inversion

Authors: Mike Laszkiewicz, Jonas Ricker, Johannes Lederer, Asja Fischer

Abstract: Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by… ▽ More Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by, first, viewing single-model attribution through the lens of anomaly detection. Arising from this change of perspective, we propose FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach and its flexibility to various domains. △ Less

Submitted 26 June, 2024; v1 submitted 26 May, 2023; originally announced June 2023.

Comments: Accepted at the Forty-first International Conference on Machine Learning [ICML2024]

arXiv:2303.04258 [pdf, ps, other]

Extremes in High Dimensions: Methods and Scalable Algorithms

Authors: Johannes Lederer, Marco Oesting

Abstract: Extreme-value theory has been explored in considerable detail for univariate and low-dimensional observations, but the field is still in an early stage regarding high-dimensional multivariate observations. In this paper, we focus on Hüsler-Reiss models and their domain of attraction, a popular class of models for multivariate extremes that exhibit some similarities to multivariate Gaussian distrib… ▽ More Extreme-value theory has been explored in considerable detail for univariate and low-dimensional observations, but the field is still in an early stage regarding high-dimensional multivariate observations. In this paper, we focus on Hüsler-Reiss models and their domain of attraction, a popular class of models for multivariate extremes that exhibit some similarities to multivariate Gaussian distributions. We devise novel estimators for the parameters of this model based on score matching and equip these estimators with state-of-the-art theories for high-dimensional settings and with exceptionally scalable algorithms. We perform a simulation study to demonstrate that the estimators can estimate a large number of parameters reliably and fast; for example, we show that Hüsler-Reiss models with thousands of parameters can be fitted within a couple of minutes on a standard laptop. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 26 pages

arXiv:2303.02114 [pdf, ps, other]

Lag selection and estimation of stable parameters for multiple autoregressive processes through convex programming

Authors: Somnath Chakraborty, Johannes Lederer, Rainer von Sachs

Abstract: Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different pro… ▽ More Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.11241 [pdf, other]

The DeepCAR Method: Forecasting Time-Series Data That Have Change Points

Authors: Ayla Jungbluth, Johannes Lederer

Abstract: Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. The DeepAR framework is a novel, recent approach for time-series forecasting based on deep learning. DeepAR has shown very promising results already. However, time series often have change points, which can degrade the DeepAR's prediction performance subst… ▽ More Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. The DeepAR framework is a novel, recent approach for time-series forecasting based on deep learning. DeepAR has shown very promising results already. However, time series often have change points, which can degrade the DeepAR's prediction performance substantially. This paper extends the DeepAR framework by detecting and including those change points. We show that our method performs as well as standard DeepAR when there are no change points and considerably better when there are change points. More generally, we show that the batch size provides an effective and surprisingly simple way to deal with change points in DeepAR, Transformers, and other modern forecasting models. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.08235 [pdf, other]

Reducing Computational and Statistical Complexity in Machine Learning Through Cardinality Sparsity

Authors: Ali Mohades, Johannes Lederer

Abstract: High-dimensional data has become ubiquitous across the sciences but causes computational and statistical challenges. A common approach for dealing with these challenges is sparsity. In this paper, we introduce a new concept of sparsity, called cardinality sparsity. Broadly speaking, we call a tensor sparse if it contains only a small number of unique values. We show that cardinality sparsity can i… ▽ More High-dimensional data has become ubiquitous across the sciences but causes computational and statistical challenges. A common approach for dealing with these challenges is sparsity. In this paper, we introduce a new concept of sparsity, called cardinality sparsity. Broadly speaking, we call a tensor sparse if it contains only a small number of unique values. We show that cardinality sparsity can improve deep learning and tensor regression both statistically and computationally. On the way, we generalize recent statistical theories in those fields. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2212.05517 [pdf, other]

doi 10.1063/5.0138367

SchNetPack 2.0: A neural network toolbox for atomistic machine learning

Authors: Kristof T. Schütt, Stefaan S. P. Hessmann, Niklas W. A. Gebauer, Jonas Lederer, Michael Gastegger

Abstract: SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a f… ▽ More SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with custom code and ready for complex training task such as generation of 3d molecular structures. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2212.05427 [pdf, ps, other]

Statistical guarantees for sparse deep learning

Authors: Johannes Lederer

Abstract: Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by develo** statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-b… ▽ More Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by develo** statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and l2-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2206.10311 [pdf, other]

Marginal Tail-Adaptive Normalizing Flows

Authors: Mike Laszkiewicz, Johannes Lederer, Asja Fischer

Abstract: Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate mo… ▽ More Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. We prove that the marginal tailedness of an autoregressive flow can be controlled via the tailedness of the marginals of its base distribution. This theoretical insight leads us to a novel type of flows based on flexible base distributions and data-driven linear layers. An empirical analysis shows that the proposed method improves on the accuracy -- especially on the tails of the distribution -- and is able to generate heavy-tailed data. We demonstrate its application on a weather and climate example, in which capturing the tail behavior is essential. △ Less

Submitted 27 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted at ICML2022 Thirty-ninth International Conference on Machine Learning

arXiv:2205.04491 [pdf, other]

Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks

Authors: Mahsa Taheri, Fang Xie, Johannes Lederer

Abstract: Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincid… ▽ More Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. More generally, despite being limited to simple neural networks for now, our theories make a step forward in describing the practical properties of neural networks in mathematical terms. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2203.16205 [pdf, other]

Automatic Identification of Chemical Moieties

Authors: Jonas Lederer, Michael Gastegger, Kristof T. Schütt, Michael Kampffmeyer, Klaus-Robert Müller, Oliver T. Unke

Abstract: In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representati… ▽ More In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates. △ Less

Submitted 27 April, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2202.00975 [pdf, other]

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Authors: Rebecca Marion, Johannes Lederer, Bernadette Govaerts, Rainer von Sachs

Abstract: Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisf… ▽ More Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.05055 [pdf, other]

Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method

Authors: Yannick Düren, Johannes Lederer, Li-Xuan Qin

Abstract: Deep sequencing has become one of the most popular tools for transcriptome profiling in biomedical studies. While an abundance of computational methods exists for "normalizing" sequencing data to remove unwanted between-sample variations due to experimental handling, there is no consensus on which normalization is the most suitable for a given data set. To address this problem, we developed "DANA"… ▽ More Deep sequencing has become one of the most popular tools for transcriptome profiling in biomedical studies. While an abundance of computational methods exists for "normalizing" sequencing data to remove unwanted between-sample variations due to experimental handling, there is no consensus on which normalization is the most suitable for a given data set. To address this problem, we developed "DANA" - an approach for assessing the performance of normalization methods for microRNA sequencing data based on biology-motivated and data-driven metrics. Our approach takes advantage of well-known biological features of microRNAs for their expression pattern and chromosomal clustering to simultaneously assess (1) how effectively normalization removes handling artifacts, and (2) how aptly normalization preserves biological signals. With DANA, we confirm that the performance of eight commonly used normalization methods vary widely across different data sets and provide guidance for selecting a suitable method for the data at hand. Hence, it should be adopted as a routine preprocessing step (preceding normalization) for microRNA sequencing data analysis. DANA is implemented in R and publicly available at https://github.com/LXQin/DANA. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: 16 pages, 6 figures

arXiv:2112.11407 [pdf, other]

doi 10.1109/MSP.2022.3153277

Toward Explainable AI for Regression Models

Authors: Simon Letzgus, Patrick Wagner, Jonas Lederer, Wojciech Samek, Klaus-Robert Müller, Gregoire Montavon

Abstract: In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex non-linear learning models such as deep neural networks. Gaining a better understanding is especially important e.g. for safety-critical ML applications or medical diagnostics etc. While such Explainable AI (XAI) techniques have re… ▽ More In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex non-linear learning models such as deep neural networks. Gaining a better understanding is especially important e.g. for safety-critical ML applications or medical diagnostics etc. While such Explainable AI (XAI) techniques have reached significant popularity for classifiers, so far little attention has been devoted to XAI for regression models (XAIR). In this review, we clarify the fundamental conceptual differences of XAI for regression and classification tasks, establish novel theoretical insights and analysis for XAIR, provide demonstrations of XAIR on genuine practical regression problems, and finally discuss the challenges remaining for the field. △ Less

Submitted 17 January, 2023; v1 submitted 21 December, 2021; originally announced December 2021.

Comments: 17 pages, 10 figures, published; changes: 1. references to code and xai-regression.org added (p. 1/2, end of introduction), 2. adjustment of sign-error in restructuring section (p. 8, just above Fig. 4)

Journal ref: IEEE Signal Processing Magazine (Volume: 39, Issue: 4, July 2022) 40-58

arXiv:2107.07352 [pdf, other]

Copula-Based Normalizing Flows

Authors: Mike Laszkiewicz, Johannes Lederer, Asja Fischer

Abstract: Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more ac… ▽ More Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more accurately. In a first empirical analysis, we demonstrate that this replacement can dramatically improve the vanilla normalizing flows in terms of flexibility, stability, and effectivity for heavy-tailed data. Our results suggest that the improvements are related to an increased local Lipschitz-stability of the learned flow. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: Accepted for presentation at the ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+ 2021)

arXiv:2106.02260 [pdf, other]

Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Authors: Leni Ven, Johannes Lederer

Abstract: Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid… ▽ More Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.14052 [pdf, other]

Targeted Deep Learning: Framework, Methods, and Applications

Authors: Shih-Ting Huang, Johannes Lederer

Abstract: Deep learning systems are typically designed to perform for a wide range of test inputs. For example, deep learning systems in autonomous cars are supposed to deal with traffic situations for which they were not specifically trained. In general, the ability to cope with a broad spectrum of unseen test inputs is called generalization. Generalization is definitely important in applications where the… ▽ More Deep learning systems are typically designed to perform for a wide range of test inputs. For example, deep learning systems in autonomous cars are supposed to deal with traffic situations for which they were not specifically trained. In general, the ability to cope with a broad spectrum of unseen test inputs is called generalization. Generalization is definitely important in applications where the possible test inputs are known but plentiful or simply unknown, but there are also cases where the possible inputs are few and unlabeled but known beforehand. For example, medicine is currently interested in targeting treatments to individual patients; the number of patients at any given time is usually small (typically one), their diagnoses/responses/... are still unknown, but their general characteristics (such as genome information, protein levels in the blood, and so forth) are known before the treatment. We propose to call deep learning in such applications targeted deep learning. In this paper, we introduce a framework for targeted deep learning, and we devise and test an approach for adapting standard pipelines to the requirements of targeted deep learning. The approach is very general yet easy to use: it can be implemented as a simple data-preprocessing step. We demonstrate on a variety of real-world data that our approach can indeed render standard deep learning faster and more accurate when the test inputs are known beforehand. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2105.14035 [pdf, other]

DeepMoM: Robust Deep Learning With Median-of-Means

Authors: Shih-Ting Huang, Johannes Lederer

Abstract: Data used in deep learning is notoriously problematic. For example, data are usually combined from diverse sources, rarely cleaned and vetted thoroughly, and sometimes corrupted on purpose. Intentional corruption that targets the weak spots of algorithms has been studied extensively under the label of "adversarial attacks." In contrast, the arguably much more common case of corruption that reflect… ▽ More Data used in deep learning is notoriously problematic. For example, data are usually combined from diverse sources, rarely cleaned and vetted thoroughly, and sometimes corrupted on purpose. Intentional corruption that targets the weak spots of algorithms has been studied extensively under the label of "adversarial attacks." In contrast, the arguably much more common case of corruption that reflects the limited quality of data has been studied much less. Such "random" corruptions are due to measurement errors, unreliable sources, convenience sampling, and so forth. These kinds of corruption are common in deep learning, because data are rarely collected according to strict protocols -- in strong contrast to the formalized data collection in some parts of classical statistics. This paper concerns such corruption. We introduce an approach motivated by very recent insights into median-of-means and Le Cam's principle, we show that the approach can be readily implemented, and we demonstrate that it performs very well in practice. In conclusion, we believe that our approach is a very promising alternative to standard parameter training based on least-squares and cross-entropy loss. △ Less

Submitted 8 November, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

arXiv:2101.09957 [pdf, other]

Activation Functions in Artificial Neural Networks: A Systematic Overview

Authors: Johannes Lederer

Abstract: Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. T… ▽ More Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. This paper provides an analytic yet up-to-date overview of popular activation functions and their properties, which makes it a timely resource for anyone who studies or applies neural networks. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2010.00885 [pdf, other]

Optimization Landscapes of Wide Deep Neural Networks Are Benign

Authors: Johannes Lederer

Abstract: We analyze the optimization landscapes of deep learning with wide networks. We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from. Hence, our theories substantiate the common belief that wide neural net… ▽ More We analyze the optimization landscapes of deep learning with wide networks. We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from. Hence, our theories substantiate the common belief that wide neural networks are not only highly expressive but also comparably easy to optimize. △ Less

Submitted 13 January, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2009.09070 [pdf, ps, other]

Is there a role for statistics in artificial intelligence?

Authors: Sarah Friedrich, Gerd Antes, Sigrid Behr, Harald Binder, Werner Brannath, Florian Dumpert, Katja Ickstadt, Hans Kestler, Johannes Lederer, Heinz Leitgöb, Markus Pauly, Ansgar Steland, Adalbert Wilhelm, Tim Friede

Abstract: The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With… ▽ More The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities. △ Less

Submitted 13 September, 2020; originally announced September 2020.

arXiv:2009.06202 [pdf, ps, other]

Risk Bounds for Robust Deep Learning

Authors: Johannes Lederer

Abstract: It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data. In this paper, we support these empirical findings with statistical theory. We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can pr… ▽ More It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data. In this paper, we support these empirical findings with statistical theory. We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data. More generally speaking, our paper provides theoretical evidence for the benefits of robust loss functions in deep learning. △ Less

Submitted 14 September, 2020; originally announced September 2020.

arXiv:2006.15604 [pdf, ps, other]

Layer Sparsity in Neural Networks

Authors: Mohamed Hebiri, Johannes Lederer

Abstract: Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call… ▽ More Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call this notion layer sparsity. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2006.12296 [pdf, ps, other]

A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics

Authors: Sophie-Charlotte Klose, Johannes Lederer

Abstract: We introduce tools for controlled variable selection to economists. In particular, we apply a recently introduced aggregation scheme for false discovery rate (FDR) control to German administrative data to determine the parts of the individual employment histories that are relevant for the career outcomes of women. Our results suggest that career outcomes can be predicted based on a small set of va… ▽ More We introduce tools for controlled variable selection to economists. In particular, we apply a recently introduced aggregation scheme for false discovery rate (FDR) control to German administrative data to determine the parts of the individual employment histories that are relevant for the career outcomes of women. Our results suggest that career outcomes can be predicted based on a small set of variables, such as daily earnings, wage increases in combination with a high level of education, employment status, and working experience. △ Less

Submitted 23 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:2006.03589 [pdf, other]

doi 10.1109/TPAMI.2021.3115452

Higher-Order Explanations of Graph Neural Networks via Relevant Walks

Authors: Thomas Schnake, Oliver Eberle, Jonas Lederer, Shinichi Nakajima, Kristof T. Schütt, Klaus-Robert Müller, Grégoire Montavon

Abstract: Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by ide… ▽ More Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification. △ Less

Submitted 26 November, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: 14 pages + 6 pages supplement

arXiv:2006.00294 [pdf, ps, other]

Statistical Guarantees for Regularized Neural Networks

Authors: Mahsa Taheri, Fang Xie, Johannes Lederer

Abstract: Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a lea… ▽ More Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. We then exemplify this guarantee with $\ell_1$-regularization, showing that the corresponding prediction error increases at most sub-linearly in the number of layers and at most logarithmically in the total number of parameters. Our results establish a mathematical basis for regularized estimation of neural networks, and they deepen our mathematical understanding of neural networks and deep learning more generally. △ Less

Submitted 11 November, 2020; v1 submitted 30 May, 2020; originally announced June 2020.

arXiv:2005.00466 [pdf, other]

Thresholded Adaptive Validation: Tuning the Graphical Lasso for Graph Recovery

Authors: Mike Laszkiewicz, Asja Fischer, Johannes Lederer

Abstract: Many Machine Learning algorithms are formulated as regularized optimization problems, but their performance hinges on a regularization parameter that needs to be calibrated to each application at hand. In this paper, we propose a general calibration scheme for regularized optimization problems and apply it to the graphical lasso, which is a method for Gaussian graphical modeling. The scheme is equ… ▽ More Many Machine Learning algorithms are formulated as regularized optimization problems, but their performance hinges on a regularization parameter that needs to be calibrated to each application at hand. In this paper, we propose a general calibration scheme for regularized optimization problems and apply it to the graphical lasso, which is a method for Gaussian graphical modeling. The scheme is equipped with theoretical guarantees and motivates a thresholding pipeline that can improve graph recovery. Moreover, requiring at most one line search over the regularization path, the calibration scheme is computationally more efficient than competing schemes that are based on resampling. Finally, we show in simulations that our approach can improve on the graph recovery of other approaches considerably. △ Less

Submitted 30 March, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: To appear in the proceedings of Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2004.11554 [pdf, other]

Estimating the Lasso's Effective Noise

Authors: Johannes Lederer, Michael Vogt

Abstract: Much of the theory for the lasso in the linear model $Y = X β^* + \varepsilon$ hinges on the quantity $2 \| X^\top \varepsilon \|_{\infty} / n$, which we call the lasso's effective noise. Among other things, the effective noise plays an important role in finite-sample bounds for the lasso, the calibration of the lasso's tuning parameter, and inference on the parameter vector $β^*$. In this paper,… ▽ More Much of the theory for the lasso in the linear model $Y = X β^* + \varepsilon$ hinges on the quantity $2 \| X^\top \varepsilon \|_{\infty} / n$, which we call the lasso's effective noise. Among other things, the effective noise plays an important role in finite-sample bounds for the lasso, the calibration of the lasso's tuning parameter, and inference on the parameter vector $β^*$. In this paper, we develop a bootstrap-based estimator of the quantiles of the effective noise. The estimator is fully data-driven, that is, does not require any additional tuning parameters. We equip our estimator with finite-sample guarantees and apply it to tuning parameter calibration for the lasso and to high-dimensional inference on the parameter vector $β^*$. △ Less

Submitted 21 January, 2022; v1 submitted 24 April, 2020; originally announced April 2020.

MSC Class: 62J07; 62F03; 62F40

Journal ref: Journal of Machine Learning Research 2021, Vol. 22, 1-32

arXiv:2002.11916 [pdf, other]

Tuning-free ridge estimators for high-dimensional generalized linear models

Authors: Shih-Ting Huang, Fang Xie, Johannes Lederer

Abstract: Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical predicti… ▽ More Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical prediction accuracies of standard ridge estimators combined with cross-validation, and we provide first theoretical guarantees. △ Less

Submitted 27 February, 2020; originally announced February 2020.

arXiv:1909.10635 [pdf, other]

Tuning parameter calibration for prediction in personalized medicine

Authors: Shih-Ting Huang, Yannick Düren, Kristoffer H. Hellton, Johannes Lederer

Abstract: Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce an alternative versio… ▽ More Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce an alternative version of the ridge estimator and target individuals by establishing a tuning parameter calibration scheme that minimizes prediction errors of individual patients. In stark contrast, classical schemes such as cross-validation minimize prediction errors only on average. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly effective both in simulations and on real data. △ Less

Submitted 2 October, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

arXiv:1907.03808 [pdf, other]

False Discovery Rates in Biological Networks

Authors: Lu Yu, Tobias Kaufmann, Johannes Lederer

Abstract: The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the correspond… ▽ More The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the corresponding graphs are prone to large numbers of false discoveries. In this paper, we introduce a novel graph estimator based on knockoffs that imitate the partial correlation structures of unconnected nodes. We show that this new estimator guarantees accurate control of the false discovery rate in theory, simulations, and biological applications, and we provide easy-to-use R code. △ Less

Submitted 4 February, 2021; v1 submitted 8 July, 2019; originally announced July 2019.

arXiv:1907.03807 [pdf, other]

doi 10.3390/e23020230

Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data

Authors: Fang Xie, Johannes Lederer

Abstract: Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach t… ▽ More Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity. △ Less

Submitted 1 March, 2021; v1 submitted 8 July, 2019; originally announced July 2019.

Journal ref: Entropy 23(2021) 230

arXiv:1812.07691 [pdf, other]

Efficiency in Lung Transplant Allocation Strategies

Authors: **g**g Zou, David J. Lederer, Daniel Rabinowitz

Abstract: Currently in the United States, lung transplantations are allocated to candidates according to the candidates' Lung Allocation Score (LAS). The LAS is an ad-hoc ranking system for patients' priorities of transplantation. The goal of this study is to develop a framework for improving patients' life expectancy over the LAS based on a comprehensive modeling of the lung transplantation waiting list. P… ▽ More Currently in the United States, lung transplantations are allocated to candidates according to the candidates' Lung Allocation Score (LAS). The LAS is an ad-hoc ranking system for patients' priorities of transplantation. The goal of this study is to develop a framework for improving patients' life expectancy over the LAS based on a comprehensive modeling of the lung transplantation waiting list. Patients and organs are modeled as arriving according to Poisson processes, patients' health status evolving a waiting time inhomogeneous Markov process until death or transplantation, with organ recipient's expected post-transplant residual life depending on waiting time and health status at transplantation. Under allocation rules satisfying minimal fairness requirements, the long-term average expected life converges, and its limit is a natural standard for comparing allocation strategies. Via the Hamilton-Jacobi-Bellman equations, upper bounds for the limiting average expected life are derived as a function of organ availability. Corresponding to each upper bound is an allocable set of (time, state) pairs at which patients would be optimally transplanted. The allocable set expands monotonically as organ availability increases, which motivates the development of an allocation strategy that leads to long-term expected life close to the upper bound. Simulation studies are conducted with model parameters estimated from national lung transplantation data. Results suggest that compared to the LAS, the proposed allocation strategy could provide a 7.7% increase in average total life. We further extended the results to the the allocation and matching of multiple organ types. △ Less

Submitted 16 April, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

Comments: 36 pages of main text, 10 figures

arXiv:1801.01394 [pdf, ps, other]

Prediction Error Bounds for Linear Regression With the TREX

Authors: Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian Müller

Abstract: The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on… ▽ More The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of TREX from a theoretical perspective and provide new insights into penalized regression in general. △ Less

Submitted 4 January, 2018; originally announced January 2018.

arXiv:1710.02950 [pdf, ps, other]

Maximum Regularized Likelihood Estimators: A General Prediction Theory and Applications

Authors: Rui Zhuang, Johannes Lederer

Abstract: Maximum regularized likelihood estimators (MRLEs) are arguably the most established class of estimators in high-dimensional statistics. In this paper, we derive guarantees for MRLEs in Kullback-Leibler divergence, a general measure of prediction accuracy. We assume only that the densities have a convex parametrization and that the regularization is definite and positive homogenous. The results thu… ▽ More Maximum regularized likelihood estimators (MRLEs) are arguably the most established class of estimators in high-dimensional statistics. In this paper, we derive guarantees for MRLEs in Kullback-Leibler divergence, a general measure of prediction accuracy. We assume only that the densities have a convex parametrization and that the regularization is definite and positive homogenous. The results thus apply to a very large variety of models and estimators, such as tensor regression and graphical models with convex and non-convex regularized methods. A main conclusion is that MRLEs are broadly consistent in prediction - regardless of whether restricted eigenvalues or similar conditions hold. △ Less

Submitted 17 October, 2018; v1 submitted 9 October, 2017; originally announced October 2017.

arXiv:1708.05499 [pdf, ps, other]

Inference for high-dimensional instrumental variables regression

Authors: David Gold, Johannes Lederer, **g Tao

Abstract: This paper concerns statistical inference for the components of a high-dimensional regression parameter despite possible endogeneity of each regressor. Given a first-stage linear model for the endogenous regressors and a second-stage linear model for the dependent variable, we develop a novel adaptation of the parametric one-step update to a generic second-stage estimator. We provide conditions un… ▽ More This paper concerns statistical inference for the components of a high-dimensional regression parameter despite possible endogeneity of each regressor. Given a first-stage linear model for the endogenous regressors and a second-stage linear model for the dependent variable, we develop a novel adaptation of the parametric one-step update to a generic second-stage estimator. We provide conditions under which the scaled update is asymptotically normal. We then introduce a two-stage Lasso procedure and show that the second-stage Lasso estimator satisfies the aforementioned conditions. Using these results, we construct asymptotically valid confidence intervals for the components of the second-stage regression coefficients. We complement our asymptotic theory with simulation studies, which demonstrate the performance of our method in finite samples. △ Less

Submitted 21 November, 2019; v1 submitted 17 August, 2017; originally announced August 2017.

Comments: 53 pages

arXiv:1704.02739 [pdf, other]

Integrating Additional Knowledge Into Estimation of Graphical Models

Authors: Yunqi Bu, Johannes Lederer

Abstract: In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate addition… ▽ More In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate additional knowledge most effectively into graph estimation. Our approach to this is to make neighborhood selection receptive to additional knowledge by strengthening the role of the tuning parameters. We demonstrate that this concept (i) can improve reproducibility, (ii) is computationally convenient and efficient, and (iii) carries a lucid Bayesian interpretation. We specifically show that the approach provides effective estimations of brain connectivity graphs from fMRI data. However, providing a general scheme for the inclusion of additional knowledge, our concept is expected to have applications in a wide range of domains. △ Less

Submitted 20 April, 2017; v1 submitted 10 April, 2017; originally announced April 2017.

Comments: 16 pages, 4 figures, 1 table

arXiv:1610.00207 [pdf, other]

Tuning parameter calibration for $\ell_1$-regularized logistic regression

Authors: Wei Li, Johannes Lederer

Abstract: Feature selection is a standard approach to understanding and modeling high-dimensional classification data, but the corresponding statistical methods hinge on tuning parameters that are difficult to calibrate. In particular, existing calibration schemes in the logistic regression framework lack any finite sample guarantees. In this paper, we introduce a novel calibration scheme for $\ell_1$-penal… ▽ More Feature selection is a standard approach to understanding and modeling high-dimensional classification data, but the corresponding statistical methods hinge on tuning parameters that are difficult to calibrate. In particular, existing calibration schemes in the logistic regression framework lack any finite sample guarantees. In this paper, we introduce a novel calibration scheme for $\ell_1$-penalized logistic regression. It is based on simple tests along the tuning parameter path and is equipped with optimal guarantees for feature selection. It is also amenable to easy and efficient implementations, and it rivals or outmatches existing methods in simulations and real data applications. △ Less

Submitted 28 February, 2019; v1 submitted 1 October, 2016; originally announced October 2016.

arXiv:1609.07195 [pdf, ps, other]

doi 10.1109/TIT.2022.3203857

Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression

Authors: Mahsa Taheri, Néhémy Lim, Johannes Lederer

Abstract: Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and gr… ▽ More Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and group-sparse regression, we show that it can improve on standard pipelines both statistically and computationally. △ Less

Submitted 14 September, 2022; v1 submitted 22 September, 2016; originally announced September 2016.

arXiv:1609.05551 [pdf, other]

Graphical Models for Discrete and Continuous Data

Authors: Rui Zhuang, Noah Simon, Johannes Lederer

Abstract: We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihoo… ▽ More We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data. △ Less

Submitted 15 June, 2019; v1 submitted 18 September, 2016; originally announced September 2016.

arXiv:1608.00624 [pdf, ps, other]

Oracle Inequalities for High-dimensional Prediction

Authors: Johannes Lederer, Lu Yu, Irina Gaynanova

Abstract: The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the… ▽ More The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the result holds irrespective of the design matrix and applies to a wide range of penalized estimators. Overall, the bound demonstrates that generic estimators can provide consistent prediction with any design matrix. From a practical point of view, the bound can help to identify the potential of specific estimators, and they can help to get a sense of the prediction accuracy in a given application. △ Less

Submitted 13 March, 2018; v1 submitted 1 August, 2016; originally announced August 2016.

arXiv:1604.06815 [pdf, other]

doi 10.1080/10618600.2017.1341414

Non-convex Global Minimization and False Discovery Rate Control for the TREX

Authors: Jacob Bien, Irina Gaynanova, Johannes Lederer, Christian Müller

Abstract: The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a poly… ▽ More The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of non-convex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and develo** this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber & Candes (2015) to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for non-convex optimization and a novel way of exploiting non-convexity for statistical inference. △ Less

Submitted 20 September, 2016; v1 submitted 22 April, 2016; originally announced April 2016.

Journal ref: Journal of Computational and Graphical Statistics 2017, Vol. 27, No. 1, 23-33

arXiv:1410.7279 [pdf, other]

Topology Adaptive Graph Estimation in High Dimensions

Authors: Johannes Lederer, Christian Müller

Abstract: We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These… ▽ More We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These simulations show that a neighborhood selection scheme based on Lasso and an optimal (in practice unknown) tuning parameter outperforms other standard methods over a large spectrum of scenarios. Moreover, we show that GTREX can rival this scheme and, therefore, can provide competitive graph estimation without the need for tuning parameter calibration. △ Less

Submitted 27 October, 2014; originally announced October 2014.

arXiv:1410.5014 [pdf, other]

Optimal Two-Step Prediction in Regression

Authors: Didier Chételat, Johannes Lederer, Joseph Salmon

Abstract: High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this pape… ▽ More High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient. △ Less

Submitted 5 June, 2017; v1 submitted 18 October, 2014; originally announced October 2014.

arXiv:1410.0247 [pdf, ps, other]

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

Authors: Michaël Chichignoud, Johannes Lederer, Martin Wainwright

Abstract: We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of… ▽ More We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of the regularization parameter, we show that our method matches the oracle performance up to a small constant factor, and show that it can be implemented by performing simple tests along a single Lasso path. By applying the Lasso to simulated and real data, we find that our novel scheme can be faster and more accurate than standard schemes such as Cross-Validation. △ Less

Submitted 8 November, 2016; v1 submitted 1 October, 2014; originally announced October 2014.

Showing 1–50 of 61 results for author: Lederer, J