Search | arXiv e-print repository

Self-Distillation for Gaussian Process Regression and Classification

Authors: Kenneth Borup, Lars Nørvang Andersen

Abstract: We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-use… ▽ More We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 10 pages; code at https://github.com/Kennethborup/gaussian_process_self_distillation

arXiv:2110.06255 [pdf, ps, other]

Not all noise is accounted equally: How differentially private learning benefits from large sampling rates

Authors: Friedrich Dörmann, Osvald Frisk, Lars Nørvang Andersen, Christian Fischer Pedersen

Abstract: Learning often involves sensitive data and as such, privacy preserving extensions to Stochastic Gradient Descent (SGD) and other machine learning algorithms have been developed using the definitions of Differential Privacy (DP). In differentially private SGD, the gradients computed at each training iteration are subject to two different types of noise. Firstly, inherent sampling noise arising from… ▽ More Learning often involves sensitive data and as such, privacy preserving extensions to Stochastic Gradient Descent (SGD) and other machine learning algorithms have been developed using the definitions of Differential Privacy (DP). In differentially private SGD, the gradients computed at each training iteration are subject to two different types of noise. Firstly, inherent sampling noise arising from the use of minibatches. Secondly, additive Gaussian noise from the underlying mechanisms that introduce privacy. In this study, we show that these two types of noise are equivalent in their effect on the utility of private neural networks, however they are not accounted for equally in the privacy budget. Given this observation, we propose a training paradigm that shifts the proportions of noise towards less inherent and more additive noise, such that more of the overall noise can be accounted for in the privacy budget. With this paradigm, we are able to improve on the state-of-the-art in the privacy/utility tradeoff of private end-to-end CNNs. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 2021 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:2104.07248 [pdf, other]

Quantitative processing of broadband data as implemented in a scientific splitbeam echosounder

Authors: Lars Nonboe Andersen, Dezhang Chu, Harald Heimvoll, Rolf Korneliussen, Gavin J. Macaulay, Egil Ona

Abstract: The use of quantitative broadband echosounders for biological studies and surveys offers considerable advantages over narrowband echosounders. These include improved spectral-based target identification and significantly increased ability to resolve individual targets. Biological studies and surveys typically require accurate measures of backscatter strength and we present here a systematic and co… ▽ More The use of quantitative broadband echosounders for biological studies and surveys offers considerable advantages over narrowband echosounders. These include improved spectral-based target identification and significantly increased ability to resolve individual targets. Biological studies and surveys typically require accurate measures of backscatter strength and we present here a systematic and comprehensive explanation of how to derive quantitative estimates of target strength and volume backscattering, as a function of frequency from broadband echosounder signals. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: 20 pages, 5 figures

arXiv:2102.13088 [pdf, other]

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Authors: Kenneth Borup, Lars N. Andersen

Abstract: Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of se… ▽ More Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. This allows us to provide the first theoretical results on the importance of using the weighted ground-truth targets in self-distillation. Our focus is on fitting nonlinear functions to training data with a weighted mean square error objective function suitable for distillation, subject to $\ell_2$ regularization of the model parameters. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization. Furthermore, we provide a closed form solution for the optimal choice of weighting parameter at each step, and show how to efficiently estimate this weighting parameter for deep learning and significantly reduce the computational requirements compared to a grid search. △ Less

Submitted 15 October, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: To be published at NeurIPS 2021; 21 pages, 14 figures

arXiv:2101.04941 [pdf, other]

Multivariate phase-type theory for the site frequency spectrum

Authors: Asger Hobolth, Mogens Bladt, Lars Nørvang Andersen

Abstract: Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important fo… ▽ More Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima's D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package phasty, and R code for the reproduction of our results is available as an accompanying vignette. △ Less

Submitted 13 January, 2021; originally announced January 2021.

MSC Class: 60J90 (Primary) 60J27; 60J28; 60J95; 92D15 (Secondary)

arXiv:1609.09725 [pdf, other]

Efficient simulation for dependent rare events with applications to extremes

Authors: Lars Nørvang Andersen, Patrick J. Laub, Leonardo Rojas-Nandayapa

Abstract: We consider the general problem of estimating probabilities which arise as a union of dependent events. We propose a flexible series of estimators for such probabilities, and describe variance reduction schemes applied to the proposed estimators. We derive efficiency results of the estimators in rare-event settings, in particular those associated with extremes. Finally, we examine the performance… ▽ More We consider the general problem of estimating probabilities which arise as a union of dependent events. We propose a flexible series of estimators for such probabilities, and describe variance reduction schemes applied to the proposed estimators. We derive efficiency results of the estimators in rare-event settings, in particular those associated with extremes. Finally, we examine the performance of our estimators in a numerical example. △ Less

Submitted 7 October, 2016; v1 submitted 30 September, 2016; originally announced September 2016.

Comments: Fixed a few minor typographical errors

MSC Class: 65C05; 65C60; 68U20

arXiv:1008.0431 [pdf, other]

An Algebraic Approach to Signaling Cascades with n Layers

Authors: Elisenda Feliu, Michael Knudsen, Lars N. Andersen, Carsten Wiuf

Abstract: Posttranslational modification of proteins is key in transmission of signals in cells. Many signaling pathways contain several layers of modification cycles that mediate and change the signal through the pathway. Here, we study a simple signaling cascade consisting of n layers of modification cycles, such that the modified protein of one layer acts as modifier in the next layer. Assuming mass-acti… ▽ More Posttranslational modification of proteins is key in transmission of signals in cells. Many signaling pathways contain several layers of modification cycles that mediate and change the signal through the pathway. Here, we study a simple signaling cascade consisting of n layers of modification cycles, such that the modified protein of one layer acts as modifier in the next layer. Assuming mass-action kinetics and taking the formation of intermediate complexes into account, we show that the steady states are solutions to a polynomial in one variable, and in fact that there is exactly one steady state for any given total amounts of substrates and enzymes. We demonstrate that many steady state concentrations are related through rational functions, which can be found recursively. For example, stimulus-response curves arise as inverse functions to explicit rational functions. We show that the stimulus-response curves of the modified substrates are shifted to the left as we move down the cascade. Further, our approach allows us to study enzyme competition, sequestration and how the steady state changes in response to changes in the total amount of substrates. Our approach is essentially algebraic and follows recent trends in the study of posttranslational modification systems. △ Less

Submitted 18 December, 2010; v1 submitted 2 August, 2010; originally announced August 2010.

arXiv:1008.0427

A General Mathematical Framework Suitable for Studying Signaling Cascades

Authors: Elisenda Feliu, Lars N. Andersen, Michael Knudsen, Carsten Wiuf

Abstract: We define a general mathematical framework for studying post-translational modification processes under the assumption of mass action kinetics. We define a general mathematical framework for studying post-translational modification processes under the assumption of mass action kinetics. △ Less

Submitted 13 December, 2011; v1 submitted 2 August, 2010; originally announced August 2010.

Comments: This paper has been withdrawn by the author due to a few inaccuracies in the text. An improved version of the results of this paper can be found in arXiv:1107.3531 [q-bio.MN]

Showing 1–8 of 8 results for author: Andersen, L N