-
Peering inside the black box: Learning the relevance of many-body functions in Neural Network potentials
Authors:
Klara Bonneau,
Jonas Lederer,
Clark Templeton,
David Rosenberger,
Klaus-Robert Müller,
Cecilia Clementi
Abstract:
Machine learned potentials are becoming a popular tool to define an effective energy model for complex systems, either incorporating electronic structure effects at the atomistic resolution, or effectively renormalizing part of the atomistic degrees of freedom at a coarse-grained resolution. One of the main criticisms to machine learned potentials is that the energy inferred by the network is not…
▽ More
Machine learned potentials are becoming a popular tool to define an effective energy model for complex systems, either incorporating electronic structure effects at the atomistic resolution, or effectively renormalizing part of the atomistic degrees of freedom at a coarse-grained resolution. One of the main criticisms to machine learned potentials is that the energy inferred by the network is not as interpretable as in more traditional approaches where a simpler functional form is used. Here we address this problem by extending tools recently proposed in the nascent field of Explainable Artificial Intelligence (XAI) to coarse-grained potentials based on graph neural networks (GNN). We demonstrate the approach on three different coarse-grained systems including two fluids (methane and water) and the protein NTL9. On these examples, we show that the neural network potentials can be in practice decomposed in relevance contributions to different orders, that can be directly interpreted and provide physical insights on the systems of interest.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
How many samples are needed to train a deep neural network?
Authors:
Pegah Golestaneh,
Mahsa Taheri,
Johannes Lederer
Abstract:
Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the…
▽ More
Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Authors:
Simon Damm,
Mike Laszkiewicz,
Johannes Lederer,
Asja Fischer
Abstract:
Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only…
▽ More
Recent advances in multimodal foundation models have set new standards in few-shot anomaly detection. This paper explores whether high-quality visual features alone are sufficient to rival existing state-of-the-art vision-language models. We affirm this by adapting DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications. We show that this approach does not only rival existing techniques but can even outmatch them in many settings. Our proposed vision-only approach, AnomalyDINO, is based on patch similarities and enables both image-level anomaly prediction and pixel-level anomaly segmentation. The approach is methodologically simple and training-free and, thus, does not require any additional data for fine-tuning or meta-learning. Despite its simplicity, AnomalyDINO achieves state-of-the-art results in one- and few-shot anomaly detection (e.g., pushing the one-shot performance on MVTec-AD from an AUROC of 93.1% to 96.6%). The reduced overhead, coupled with its outstanding few-shot performance, makes AnomalyDINO a strong candidate for fast deployment, for example, in industrial contexts.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Benchmarking the Fairness of Image Upsampling Methods
Authors:
Mike Laszkiewicz,
Imant Daunhawer,
Julia E. Vogt,
Asja Fischer,
Johannes Lederer
Abstract:
Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional gener…
▽ More
Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$\unicode{x2013}$inspired by their supervised fairness counterparts$\unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results. All experiments can be reproduced using our provided repository.
△ Less
Submitted 29 April, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Affine Invariance in Continuous-Domain Convolutional Neural Networks
Authors:
Ali Mohaddes,
Johannes Lederer
Abstract:
The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considerin…
▽ More
The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the generalized linear group $\mathrm{GL}_2(\mathbb{R})$. We introduce a new criterion to assess the similarity of two input signals under affine transformations. Then, unlike conventional methods that involve solving complex optimization problems on the Lie group $G_2$, we analyze the convolution of lifted signals and compute the corresponding integration over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that practical deep-learning pipelines can handle.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Set-Membership Inference Attacks using Data Watermarking
Authors:
Mike Laszkiewicz,
Denis Lukovnikov,
Johannes Lederer,
Asja Fischer
Abstract:
In this work, we propose a set-membership inference attack for generative models using deep image watermarking techniques. In particular, we demonstrate how conditional sampling from a generative model can reveal the watermark that was injected into parts of the training data. Our empirical results demonstrate that the proposed watermarking technique is a principled approach for detecting the non-…
▽ More
In this work, we propose a set-membership inference attack for generative models using deep image watermarking techniques. In particular, we demonstrate how conditional sampling from a generative model can reveal the watermark that was injected into parts of the training data. Our empirical results demonstrate that the proposed watermarking technique is a principled approach for detecting the non-consensual use of image data in training generative models.
△ Less
Submitted 22 June, 2023;
originally announced July 2023.
-
Single-Model Attribution of Generative Models Through Final-Layer Inversion
Authors:
Mike Laszkiewicz,
Jonas Ricker,
Johannes Lederer,
Asja Fischer
Abstract:
Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by…
▽ More
Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by, first, viewing single-model attribution through the lens of anomaly detection. Arising from this change of perspective, we propose FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach and its flexibility to various domains.
△ Less
Submitted 26 June, 2024; v1 submitted 26 May, 2023;
originally announced June 2023.
-
Extremes in High Dimensions: Methods and Scalable Algorithms
Authors:
Johannes Lederer,
Marco Oesting
Abstract:
Extreme-value theory has been explored in considerable detail for univariate and low-dimensional observations, but the field is still in an early stage regarding high-dimensional multivariate observations. In this paper, we focus on Hüsler-Reiss models and their domain of attraction, a popular class of models for multivariate extremes that exhibit some similarities to multivariate Gaussian distrib…
▽ More
Extreme-value theory has been explored in considerable detail for univariate and low-dimensional observations, but the field is still in an early stage regarding high-dimensional multivariate observations. In this paper, we focus on Hüsler-Reiss models and their domain of attraction, a popular class of models for multivariate extremes that exhibit some similarities to multivariate Gaussian distributions. We devise novel estimators for the parameters of this model based on score matching and equip these estimators with state-of-the-art theories for high-dimensional settings and with exceptionally scalable algorithms. We perform a simulation study to demonstrate that the estimators can estimate a large number of parameters reliably and fast; for example, we show that Hüsler-Reiss models with thousands of parameters can be fitted within a couple of minutes on a standard laptop.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
Lag selection and estimation of stable parameters for multiple autoregressive processes through convex programming
Authors:
Somnath Chakraborty,
Johannes Lederer,
Rainer von Sachs
Abstract:
Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different pro…
▽ More
Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
The DeepCAR Method: Forecasting Time-Series Data That Have Change Points
Authors:
Ayla Jungbluth,
Johannes Lederer
Abstract:
Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. The DeepAR framework is a novel, recent approach for time-series forecasting based on deep learning. DeepAR has shown very promising results already. However, time series often have change points, which can degrade the DeepAR's prediction performance subst…
▽ More
Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. The DeepAR framework is a novel, recent approach for time-series forecasting based on deep learning. DeepAR has shown very promising results already. However, time series often have change points, which can degrade the DeepAR's prediction performance substantially. This paper extends the DeepAR framework by detecting and including those change points. We show that our method performs as well as standard DeepAR when there are no change points and considerably better when there are change points. More generally, we show that the batch size provides an effective and surprisingly simple way to deal with change points in DeepAR, Transformers, and other modern forecasting models.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Reducing Computational and Statistical Complexity in Machine Learning Through Cardinality Sparsity
Authors:
Ali Mohades,
Johannes Lederer
Abstract:
High-dimensional data has become ubiquitous across the sciences but causes computational and statistical challenges. A common approach for dealing with these challenges is sparsity. In this paper, we introduce a new concept of sparsity, called cardinality sparsity. Broadly speaking, we call a tensor sparse if it contains only a small number of unique values. We show that cardinality sparsity can i…
▽ More
High-dimensional data has become ubiquitous across the sciences but causes computational and statistical challenges. A common approach for dealing with these challenges is sparsity. In this paper, we introduce a new concept of sparsity, called cardinality sparsity. Broadly speaking, we call a tensor sparse if it contains only a small number of unique values. We show that cardinality sparsity can improve deep learning and tensor regression both statistically and computationally. On the way, we generalize recent statistical theories in those fields.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
SchNetPack 2.0: A neural network toolbox for atomistic machine learning
Authors:
Kristof T. Schütt,
Stefaan S. P. Hessmann,
Niklas W. A. Gebauer,
Jonas Lederer,
Michael Gastegger
Abstract:
SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a f…
▽ More
SchNetPack is a versatile neural networks toolbox that addresses both the requirements of method development and application of atomistic machine learning. Version 2.0 comes with an improved data pipeline, modules for equivariant neural networks as well as a PyTorch implementation of molecular dynamics. An optional integration with PyTorch Lightning and the Hydra configuration framework powers a flexible command-line interface. This makes SchNetPack 2.0 easily extendable with custom code and ready for complex training task such as generation of 3d molecular structures.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Statistical guarantees for sparse deep learning
Authors:
Johannes Lederer
Abstract:
Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by develo** statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-b…
▽ More
Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by develo** statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and l2-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Marginal Tail-Adaptive Normalizing Flows
Authors:
Mike Laszkiewicz,
Johannes Lederer,
Asja Fischer
Abstract:
Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate mo…
▽ More
Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. We prove that the marginal tailedness of an autoregressive flow can be controlled via the tailedness of the marginals of its base distribution. This theoretical insight leads us to a novel type of flows based on flexible base distributions and data-driven linear layers. An empirical analysis shows that the proposed method improves on the accuracy -- especially on the tails of the distribution -- and is able to generate heavy-tailed data. We demonstrate its application on a weather and climate example, in which capturing the tail behavior is essential.
△ Less
Submitted 27 June, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks
Authors:
Mahsa Taheri,
Fang Xie,
Johannes Lederer
Abstract:
Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincid…
▽ More
Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. More generally, despite being limited to simple neural networks for now, our theories make a step forward in describing the practical properties of neural networks in mathematical terms.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Automatic Identification of Chemical Moieties
Authors:
Jonas Lederer,
Michael Gastegger,
Kristof T. Schütt,
Michael Kampffmeyer,
Klaus-Robert Müller,
Oliver T. Unke
Abstract:
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representati…
▽ More
In recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge. The required representation can either be provided by a pretrained MPNN, or learned from scratch using only structural information. Beyond the data-driven design of molecular fingerprints, the versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases, the automatic construction of coarse-grained force fields, as well as the identification of reaction coordinates.
△ Less
Submitted 27 April, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering
Authors:
Rebecca Marion,
Johannes Lederer,
Bernadette Govaerts,
Rainer von Sachs
Abstract:
Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisf…
▽ More
Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method
Authors:
Yannick Düren,
Johannes Lederer,
Li-Xuan Qin
Abstract:
Deep sequencing has become one of the most popular tools for transcriptome profiling in biomedical studies. While an abundance of computational methods exists for "normalizing" sequencing data to remove unwanted between-sample variations due to experimental handling, there is no consensus on which normalization is the most suitable for a given data set. To address this problem, we developed "DANA"…
▽ More
Deep sequencing has become one of the most popular tools for transcriptome profiling in biomedical studies. While an abundance of computational methods exists for "normalizing" sequencing data to remove unwanted between-sample variations due to experimental handling, there is no consensus on which normalization is the most suitable for a given data set. To address this problem, we developed "DANA" - an approach for assessing the performance of normalization methods for microRNA sequencing data based on biology-motivated and data-driven metrics. Our approach takes advantage of well-known biological features of microRNAs for their expression pattern and chromosomal clustering to simultaneously assess (1) how effectively normalization removes handling artifacts, and (2) how aptly normalization preserves biological signals. With DANA, we confirm that the performance of eight commonly used normalization methods vary widely across different data sets and provide guidance for selecting a suitable method for the data at hand. Hence, it should be adopted as a routine preprocessing step (preceding normalization) for microRNA sequencing data analysis. DANA is implemented in R and publicly available at https://github.com/LXQin/DANA.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Toward Explainable AI for Regression Models
Authors:
Simon Letzgus,
Patrick Wagner,
Jonas Lederer,
Wojciech Samek,
Klaus-Robert Müller,
Gregoire Montavon
Abstract:
In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex non-linear learning models such as deep neural networks. Gaining a better understanding is especially important e.g. for safety-critical ML applications or medical diagnostics etc. While such Explainable AI (XAI) techniques have re…
▽ More
In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex non-linear learning models such as deep neural networks. Gaining a better understanding is especially important e.g. for safety-critical ML applications or medical diagnostics etc. While such Explainable AI (XAI) techniques have reached significant popularity for classifiers, so far little attention has been devoted to XAI for regression models (XAIR). In this review, we clarify the fundamental conceptual differences of XAI for regression and classification tasks, establish novel theoretical insights and analysis for XAIR, provide demonstrations of XAIR on genuine practical regression problems, and finally discuss the challenges remaining for the field.
△ Less
Submitted 17 January, 2023; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Copula-Based Normalizing Flows
Authors:
Mike Laszkiewicz,
Johannes Lederer,
Asja Fischer
Abstract:
Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more ac…
▽ More
Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more accurately. In a first empirical analysis, we demonstrate that this replacement can dramatically improve the vanilla normalizing flows in terms of flexibility, stability, and effectivity for heavy-tailed data. Our results suggest that the improvements are related to an increased local Lipschitz-stability of the learned flow.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks
Authors:
Leni Ven,
Johannes Lederer
Abstract:
Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid…
▽ More
Deep learning requires several design choices, such as the nodes' activation functions and the widths, types, and arrangements of the layers. One consideration when making these choices is the vanishing-gradient problem, which is the phenomenon of algorithms getting stuck at suboptimal points due to small gradients. In this paper, we revisit the vanishing-gradient problem in the context of sigmoid-type activation. We use mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. We then demonstrate the effectiveness of the two remedies in practice. In view of the vanishing-gradient problem being a main reason why tanh and other sigmoid-type activation has become much less popular than relu-type activation, our results bring sigmoid-type activation back to the table.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Targeted Deep Learning: Framework, Methods, and Applications
Authors:
Shih-Ting Huang,
Johannes Lederer
Abstract:
Deep learning systems are typically designed to perform for a wide range of test inputs. For example, deep learning systems in autonomous cars are supposed to deal with traffic situations for which they were not specifically trained. In general, the ability to cope with a broad spectrum of unseen test inputs is called generalization. Generalization is definitely important in applications where the…
▽ More
Deep learning systems are typically designed to perform for a wide range of test inputs. For example, deep learning systems in autonomous cars are supposed to deal with traffic situations for which they were not specifically trained. In general, the ability to cope with a broad spectrum of unseen test inputs is called generalization. Generalization is definitely important in applications where the possible test inputs are known but plentiful or simply unknown, but there are also cases where the possible inputs are few and unlabeled but known beforehand. For example, medicine is currently interested in targeting treatments to individual patients; the number of patients at any given time is usually small (typically one), their diagnoses/responses/... are still unknown, but their general characteristics (such as genome information, protein levels in the blood, and so forth) are known before the treatment. We propose to call deep learning in such applications targeted deep learning. In this paper, we introduce a framework for targeted deep learning, and we devise and test an approach for adapting standard pipelines to the requirements of targeted deep learning. The approach is very general yet easy to use: it can be implemented as a simple data-preprocessing step. We demonstrate on a variety of real-world data that our approach can indeed render standard deep learning faster and more accurate when the test inputs are known beforehand.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
DeepMoM: Robust Deep Learning With Median-of-Means
Authors:
Shih-Ting Huang,
Johannes Lederer
Abstract:
Data used in deep learning is notoriously problematic. For example, data are usually combined from diverse sources, rarely cleaned and vetted thoroughly, and sometimes corrupted on purpose. Intentional corruption that targets the weak spots of algorithms has been studied extensively under the label of "adversarial attacks." In contrast, the arguably much more common case of corruption that reflect…
▽ More
Data used in deep learning is notoriously problematic. For example, data are usually combined from diverse sources, rarely cleaned and vetted thoroughly, and sometimes corrupted on purpose. Intentional corruption that targets the weak spots of algorithms has been studied extensively under the label of "adversarial attacks." In contrast, the arguably much more common case of corruption that reflects the limited quality of data has been studied much less. Such "random" corruptions are due to measurement errors, unreliable sources, convenience sampling, and so forth. These kinds of corruption are common in deep learning, because data are rarely collected according to strict protocols -- in strong contrast to the formalized data collection in some parts of classical statistics. This paper concerns such corruption. We introduce an approach motivated by very recent insights into median-of-means and Le Cam's principle, we show that the approach can be readily implemented, and we demonstrate that it performs very well in practice. In conclusion, we believe that our approach is a very promising alternative to standard parameter training based on least-squares and cross-entropy loss.
△ Less
Submitted 8 November, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Activation Functions in Artificial Neural Networks: A Systematic Overview
Authors:
Johannes Lederer
Abstract:
Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. T…
▽ More
Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. This paper provides an analytic yet up-to-date overview of popular activation functions and their properties, which makes it a timely resource for anyone who studies or applies neural networks.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Optimization Landscapes of Wide Deep Neural Networks Are Benign
Authors:
Johannes Lederer
Abstract:
We analyze the optimization landscapes of deep learning with wide networks. We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from. Hence, our theories substantiate the common belief that wide neural net…
▽ More
We analyze the optimization landscapes of deep learning with wide networks. We highlight the importance of constraints for such networks and show that constraint -- as well as unconstraint -- empirical-risk minimization over such networks has no confined points, that is, suboptimal parameters that are difficult to escape from. Hence, our theories substantiate the common belief that wide neural networks are not only highly expressive but also comparably easy to optimize.
△ Less
Submitted 13 January, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Is there a role for statistics in artificial intelligence?
Authors:
Sarah Friedrich,
Gerd Antes,
Sigrid Behr,
Harald Binder,
Werner Brannath,
Florian Dumpert,
Katja Ickstadt,
Hans Kestler,
Johannes Lederer,
Heinz Leitgöb,
Markus Pauly,
Ansgar Steland,
Adalbert Wilhelm,
Tim Friede
Abstract:
The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With…
▽ More
The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Risk Bounds for Robust Deep Learning
Authors:
Johannes Lederer
Abstract:
It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data. In this paper, we support these empirical findings with statistical theory. We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can pr…
▽ More
It has been observed that certain loss functions can render deep-learning pipelines robust against flaws in the data. In this paper, we support these empirical findings with statistical theory. We especially show that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data. More generally speaking, our paper provides theoretical evidence for the benefits of robust loss functions in deep learning.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Layer Sparsity in Neural Networks
Authors:
Mohamed Hebiri,
Johannes Lederer
Abstract:
Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call…
▽ More
Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call this notion layer sparsity. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics
Authors:
Sophie-Charlotte Klose,
Johannes Lederer
Abstract:
We introduce tools for controlled variable selection to economists. In particular, we apply a recently introduced aggregation scheme for false discovery rate (FDR) control to German administrative data to determine the parts of the individual employment histories that are relevant for the career outcomes of women. Our results suggest that career outcomes can be predicted based on a small set of va…
▽ More
We introduce tools for controlled variable selection to economists. In particular, we apply a recently introduced aggregation scheme for false discovery rate (FDR) control to German administrative data to determine the parts of the individual employment histories that are relevant for the career outcomes of women. Our results suggest that career outcomes can be predicted based on a small set of variables, such as daily earnings, wage increases in combination with a high level of education, employment status, and working experience.
△ Less
Submitted 23 June, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Higher-Order Explanations of Graph Neural Networks via Relevant Walks
Authors:
Thomas Schnake,
Oliver Eberle,
Jonas Lederer,
Shinichi Nakajima,
Kristof T. Schütt,
Klaus-Robert Müller,
Grégoire Montavon
Abstract:
Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by ide…
▽ More
Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification.
△ Less
Submitted 26 November, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Statistical Guarantees for Regularized Neural Networks
Authors:
Mahsa Taheri,
Fang Xie,
Johannes Lederer
Abstract:
Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a lea…
▽ More
Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. We then exemplify this guarantee with $\ell_1$-regularization, showing that the corresponding prediction error increases at most sub-linearly in the number of layers and at most logarithmically in the total number of parameters. Our results establish a mathematical basis for regularized estimation of neural networks, and they deepen our mathematical understanding of neural networks and deep learning more generally.
△ Less
Submitted 11 November, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Thresholded Adaptive Validation: Tuning the Graphical Lasso for Graph Recovery
Authors:
Mike Laszkiewicz,
Asja Fischer,
Johannes Lederer
Abstract:
Many Machine Learning algorithms are formulated as regularized optimization problems, but their performance hinges on a regularization parameter that needs to be calibrated to each application at hand. In this paper, we propose a general calibration scheme for regularized optimization problems and apply it to the graphical lasso, which is a method for Gaussian graphical modeling. The scheme is equ…
▽ More
Many Machine Learning algorithms are formulated as regularized optimization problems, but their performance hinges on a regularization parameter that needs to be calibrated to each application at hand. In this paper, we propose a general calibration scheme for regularized optimization problems and apply it to the graphical lasso, which is a method for Gaussian graphical modeling. The scheme is equipped with theoretical guarantees and motivates a thresholding pipeline that can improve graph recovery. Moreover, requiring at most one line search over the regularization path, the calibration scheme is computationally more efficient than competing schemes that are based on resampling. Finally, we show in simulations that our approach can improve on the graph recovery of other approaches considerably.
△ Less
Submitted 30 March, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
Estimating the Lasso's Effective Noise
Authors:
Johannes Lederer,
Michael Vogt
Abstract:
Much of the theory for the lasso in the linear model $Y = X β^* + \varepsilon$ hinges on the quantity $2 \| X^\top \varepsilon \|_{\infty} / n$, which we call the lasso's effective noise. Among other things, the effective noise plays an important role in finite-sample bounds for the lasso, the calibration of the lasso's tuning parameter, and inference on the parameter vector $β^*$. In this paper,…
▽ More
Much of the theory for the lasso in the linear model $Y = X β^* + \varepsilon$ hinges on the quantity $2 \| X^\top \varepsilon \|_{\infty} / n$, which we call the lasso's effective noise. Among other things, the effective noise plays an important role in finite-sample bounds for the lasso, the calibration of the lasso's tuning parameter, and inference on the parameter vector $β^*$. In this paper, we develop a bootstrap-based estimator of the quantiles of the effective noise. The estimator is fully data-driven, that is, does not require any additional tuning parameters. We equip our estimator with finite-sample guarantees and apply it to tuning parameter calibration for the lasso and to high-dimensional inference on the parameter vector $β^*$.
△ Less
Submitted 21 January, 2022; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Tuning-free ridge estimators for high-dimensional generalized linear models
Authors:
Shih-Ting Huang,
Fang Xie,
Johannes Lederer
Abstract:
Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical predicti…
▽ More
Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that can be difficult to calibrate. In this paper, we show that ridge estimators can be modified such that tuning parameters can be avoided altogether. We also show that these modified versions can improve on the empirical prediction accuracies of standard ridge estimators combined with cross-validation, and we provide first theoretical guarantees.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Tuning parameter calibration for prediction in personalized medicine
Authors:
Shih-Ting Huang,
Yannick Düren,
Kristoffer H. Hellton,
Johannes Lederer
Abstract:
Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce an alternative versio…
▽ More
Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce an alternative version of the ridge estimator and target individuals by establishing a tuning parameter calibration scheme that minimizes prediction errors of individual patients. In stark contrast, classical schemes such as cross-validation minimize prediction errors only on average. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly effective both in simulations and on real data.
△ Less
Submitted 2 October, 2019; v1 submitted 23 September, 2019;
originally announced September 2019.
-
False Discovery Rates in Biological Networks
Authors:
Lu Yu,
Tobias Kaufmann,
Johannes Lederer
Abstract:
The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the correspond…
▽ More
The increasing availability of data has generated unprecedented prospects for network analyses in many biological fields, such as neuroscience (e.g., brain networks), genomics (e.g., gene-gene interaction networks), and ecology (e.g., species interaction networks). A powerful statistical framework for estimating such networks is Gaussian graphical models, but standard estimators for the corresponding graphs are prone to large numbers of false discoveries. In this paper, we introduce a novel graph estimator based on knockoffs that imitate the partial correlation structures of unconnected nodes. We show that this new estimator guarantees accurate control of the false discovery rate in theory, simulations, and biological applications, and we provide easy-to-use R code.
△ Less
Submitted 4 February, 2021; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data
Authors:
Fang Xie,
Johannes Lederer
Abstract:
Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach t…
▽ More
Recent discoveries suggest that our gut microbiome plays an important role in our health and wellbeing. However, the gut microbiome data are intricate; for example, the microbial diversity in the gut makes the data high-dimensional. While there are dedicated high-dimensional methods, such as the lasso estimator, they always come with the risk of false discoveries. Knockoffs are a recent approach to control the number of false discoveries. In this paper, we show that knockoffs can be aggregated to increase power while retaining sharp control over the false discoveries. We support our method both in theory and simulations, and we show that it can lead to new discoveries on microbiome data from the American Gut Project. In particular, our results indicate that several phyla that have been overlooked so far are associated with obesity.
△ Less
Submitted 1 March, 2021; v1 submitted 8 July, 2019;
originally announced July 2019.
-
Efficiency in Lung Transplant Allocation Strategies
Authors:
**g**g Zou,
David J. Lederer,
Daniel Rabinowitz
Abstract:
Currently in the United States, lung transplantations are allocated to candidates according to the candidates' Lung Allocation Score (LAS). The LAS is an ad-hoc ranking system for patients' priorities of transplantation. The goal of this study is to develop a framework for improving patients' life expectancy over the LAS based on a comprehensive modeling of the lung transplantation waiting list. P…
▽ More
Currently in the United States, lung transplantations are allocated to candidates according to the candidates' Lung Allocation Score (LAS). The LAS is an ad-hoc ranking system for patients' priorities of transplantation. The goal of this study is to develop a framework for improving patients' life expectancy over the LAS based on a comprehensive modeling of the lung transplantation waiting list. Patients and organs are modeled as arriving according to Poisson processes, patients' health status evolving a waiting time inhomogeneous Markov process until death or transplantation, with organ recipient's expected post-transplant residual life depending on waiting time and health status at transplantation. Under allocation rules satisfying minimal fairness requirements, the long-term average expected life converges, and its limit is a natural standard for comparing allocation strategies. Via the Hamilton-Jacobi-Bellman equations, upper bounds for the limiting average expected life are derived as a function of organ availability. Corresponding to each upper bound is an allocable set of (time, state) pairs at which patients would be optimally transplanted. The allocable set expands monotonically as organ availability increases, which motivates the development of an allocation strategy that leads to long-term expected life close to the upper bound. Simulation studies are conducted with model parameters estimated from national lung transplantation data. Results suggest that compared to the LAS, the proposed allocation strategy could provide a 7.7% increase in average total life. We further extended the results to the the allocation and matching of multiple organ types.
△ Less
Submitted 16 April, 2020; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Prediction Error Bounds for Linear Regression With the TREX
Authors:
Jacob Bien,
Irina Gaynanova,
Johannes Lederer,
Christian Müller
Abstract:
The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on…
▽ More
The TREX is a recently introduced approach to sparse linear regression. In contrast to most well-known approaches to penalized regression, the TREX can be formulated without the use of tuning parameters. In this paper, we establish the first known prediction error bounds for the TREX. Additionally, we introduce extensions of the TREX to a more general class of penalties, and we provide a bound on the prediction error in this generalized setting. These results deepen the understanding of TREX from a theoretical perspective and provide new insights into penalized regression in general.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
Maximum Regularized Likelihood Estimators: A General Prediction Theory and Applications
Authors:
Rui Zhuang,
Johannes Lederer
Abstract:
Maximum regularized likelihood estimators (MRLEs) are arguably the most established class of estimators in high-dimensional statistics. In this paper, we derive guarantees for MRLEs in Kullback-Leibler divergence, a general measure of prediction accuracy. We assume only that the densities have a convex parametrization and that the regularization is definite and positive homogenous. The results thu…
▽ More
Maximum regularized likelihood estimators (MRLEs) are arguably the most established class of estimators in high-dimensional statistics. In this paper, we derive guarantees for MRLEs in Kullback-Leibler divergence, a general measure of prediction accuracy. We assume only that the densities have a convex parametrization and that the regularization is definite and positive homogenous. The results thus apply to a very large variety of models and estimators, such as tensor regression and graphical models with convex and non-convex regularized methods. A main conclusion is that MRLEs are broadly consistent in prediction - regardless of whether restricted eigenvalues or similar conditions hold.
△ Less
Submitted 17 October, 2018; v1 submitted 9 October, 2017;
originally announced October 2017.
-
Inference for high-dimensional instrumental variables regression
Authors:
David Gold,
Johannes Lederer,
**g Tao
Abstract:
This paper concerns statistical inference for the components of a high-dimensional regression parameter despite possible endogeneity of each regressor. Given a first-stage linear model for the endogenous regressors and a second-stage linear model for the dependent variable, we develop a novel adaptation of the parametric one-step update to a generic second-stage estimator. We provide conditions un…
▽ More
This paper concerns statistical inference for the components of a high-dimensional regression parameter despite possible endogeneity of each regressor. Given a first-stage linear model for the endogenous regressors and a second-stage linear model for the dependent variable, we develop a novel adaptation of the parametric one-step update to a generic second-stage estimator. We provide conditions under which the scaled update is asymptotically normal. We then introduce a two-stage Lasso procedure and show that the second-stage Lasso estimator satisfies the aforementioned conditions. Using these results, we construct asymptotically valid confidence intervals for the components of the second-stage regression coefficients. We complement our asymptotic theory with simulation studies, which demonstrate the performance of our method in finite samples.
△ Less
Submitted 21 November, 2019; v1 submitted 17 August, 2017;
originally announced August 2017.
-
Integrating Additional Knowledge Into Estimation of Graphical Models
Authors:
Yunqi Bu,
Johannes Lederer
Abstract:
In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate addition…
▽ More
In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate additional knowledge most effectively into graph estimation. Our approach to this is to make neighborhood selection receptive to additional knowledge by strengthening the role of the tuning parameters. We demonstrate that this concept (i) can improve reproducibility, (ii) is computationally convenient and efficient, and (iii) carries a lucid Bayesian interpretation. We specifically show that the approach provides effective estimations of brain connectivity graphs from fMRI data. However, providing a general scheme for the inclusion of additional knowledge, our concept is expected to have applications in a wide range of domains.
△ Less
Submitted 20 April, 2017; v1 submitted 10 April, 2017;
originally announced April 2017.
-
Tuning parameter calibration for $\ell_1$-regularized logistic regression
Authors:
Wei Li,
Johannes Lederer
Abstract:
Feature selection is a standard approach to understanding and modeling high-dimensional classification data, but the corresponding statistical methods hinge on tuning parameters that are difficult to calibrate. In particular, existing calibration schemes in the logistic regression framework lack any finite sample guarantees. In this paper, we introduce a novel calibration scheme for $\ell_1$-penal…
▽ More
Feature selection is a standard approach to understanding and modeling high-dimensional classification data, but the corresponding statistical methods hinge on tuning parameters that are difficult to calibrate. In particular, existing calibration schemes in the logistic regression framework lack any finite sample guarantees. In this paper, we introduce a novel calibration scheme for $\ell_1$-penalized logistic regression. It is based on simple tests along the tuning parameter path and is equipped with optimal guarantees for feature selection. It is also amenable to easy and efficient implementations, and it rivals or outmatches existing methods in simulations and real data applications.
△ Less
Submitted 28 February, 2019; v1 submitted 1 October, 2016;
originally announced October 2016.
-
Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression
Authors:
Mahsa Taheri,
Néhémy Lim,
Johannes Lederer
Abstract:
Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and gr…
▽ More
Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and group-sparse regression, we show that it can improve on standard pipelines both statistically and computationally.
△ Less
Submitted 14 September, 2022; v1 submitted 22 September, 2016;
originally announced September 2016.
-
Graphical Models for Discrete and Continuous Data
Authors:
Rui Zhuang,
Noah Simon,
Johannes Lederer
Abstract:
We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihoo…
▽ More
We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data.
△ Less
Submitted 15 June, 2019; v1 submitted 18 September, 2016;
originally announced September 2016.
-
Oracle Inequalities for High-dimensional Prediction
Authors:
Johannes Lederer,
Lu Yu,
Irina Gaynanova
Abstract:
The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the…
▽ More
The abundance of high-dimensional data in the modern sciences has generated tremendous interest in penalized estimators such as the lasso, scaled lasso, square-root lasso, elastic net, and many others. In this paper, we establish a general oracle inequality for prediction in high-dimensional linear regression with such methods. Since the proof relies only on convexity and continuity arguments, the result holds irrespective of the design matrix and applies to a wide range of penalized estimators. Overall, the bound demonstrates that generic estimators can provide consistent prediction with any design matrix. From a practical point of view, the bound can help to identify the potential of specific estimators, and they can help to get a sense of the prediction accuracy in a given application.
△ Less
Submitted 13 March, 2018; v1 submitted 1 August, 2016;
originally announced August 2016.
-
Non-convex Global Minimization and False Discovery Rate Control for the TREX
Authors:
Jacob Bien,
Irina Gaynanova,
Johannes Lederer,
Christian Müller
Abstract:
The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a poly…
▽ More
The TREX is a recently introduced method for performing sparse high-dimensional regression. Despite its statistical promise as an alternative to the lasso, square-root lasso, and scaled lasso, the TREX is computationally challenging in that it requires solving a non-convex optimization problem. This paper shows a remarkable result: despite the non-convexity of the TREX problem, there exists a polynomial-time algorithm that is guaranteed to find the global minimum. This result adds the TREX to a very short list of non-convex optimization problems that can be globally optimized (principal components analysis being a famous example). After deriving and develo** this new approach, we demonstrate that (i) the ability of the preexisting TREX heuristic to reach the global minimum is strongly dependent on the difficulty of the underlying statistical problem, (ii) the new polynomial-time algorithm for TREX permits a novel variable ranking and selection scheme, (iii) this scheme can be incorporated into a rule that controls the false discovery rate (FDR) of included features in the model. To achieve this last aim, we provide an extension of the results of Barber & Candes (2015) to establish that the knockoff filter framework can be applied to the TREX. This investigation thus provides both a rare case study of a heuristic for non-convex optimization and a novel way of exploiting non-convexity for statistical inference.
△ Less
Submitted 20 September, 2016; v1 submitted 22 April, 2016;
originally announced April 2016.
-
Topology Adaptive Graph Estimation in High Dimensions
Authors:
Johannes Lederer,
Christian Müller
Abstract:
We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These…
▽ More
We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compare GTREX with standard methods on a new simulation set-up that is designed to assess accurately the strengths and shortcomings of different methods. These simulations show that a neighborhood selection scheme based on Lasso and an optimal (in practice unknown) tuning parameter outperforms other standard methods over a large spectrum of scenarios. Moreover, we show that GTREX can rival this scheme and, therefore, can provide competitive graph estimation without the need for tuning parameter calibration.
△ Less
Submitted 27 October, 2014;
originally announced October 2014.
-
Optimal Two-Step Prediction in Regression
Authors:
Didier Chételat,
Johannes Lederer,
Joseph Salmon
Abstract:
High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this pape…
▽ More
High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient.
△ Less
Submitted 5 June, 2017; v1 submitted 18 October, 2014;
originally announced October 2014.
-
A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees
Authors:
Michaël Chichignoud,
Johannes Lederer,
Martin Wainwright
Abstract:
We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of…
▽ More
We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of the regularization parameter, we show that our method matches the oracle performance up to a small constant factor, and show that it can be implemented by performing simple tests along a single Lasso path. By applying the Lasso to simulated and real data, we find that our novel scheme can be faster and more accurate than standard schemes such as Cross-Validation.
△ Less
Submitted 8 November, 2016; v1 submitted 1 October, 2014;
originally announced October 2014.