-
On the Growth of Mistakes in Differentially Private Online Learning: A Lower Bound Perspective
Authors:
Daniil Dmitriev,
Kristóf Szabó,
Amartya Sanyal
Abstract:
In this paper, we provide lower bounds for Differentially Private (DP) Online Learning algorithms. Our result shows that, for a broad class of $(\varepsilon,δ)$-DP online algorithms, for $T$ such that $\log T\leq O(1 / δ)$, the expected number of mistakes incurred by the algorithm grows as $Ω(\log \frac{T}δ)$. This matches the upper bound obtained by Golowich and Livni (2021) and is in contrast to…
▽ More
In this paper, we provide lower bounds for Differentially Private (DP) Online Learning algorithms. Our result shows that, for a broad class of $(\varepsilon,δ)$-DP online algorithms, for $T$ such that $\log T\leq O(1 / δ)$, the expected number of mistakes incurred by the algorithm grows as $Ω(\log \frac{T}δ)$. This matches the upper bound obtained by Golowich and Livni (2021) and is in contrast to non-private online learning where the number of mistakes is independent of $T$. To the best of our knowledge, our work is the first result towards settling lower bounds for DP-Online learning and partially addresses the open question in Sanyal and Ramponi (2022).
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Asymptotics of Learning with Deep Structured (Random) Features
Authors:
Dominik Schröder,
Daniil Dmitriev,
Hugo Cui,
Bruno Loureiro
Abstract:
For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated…
▽ More
For a large class of feature maps we provide a tight asymptotic characterisation of the test error associated with learning the readout layer, in the high-dimensional limit where the input dimension, hidden layer widths, and number of training samples are proportionally large. This characterization is formulated in terms of the population covariance of the features. Our work is partially motivated by the problem of learning with Gaussian rainbow neural networks, namely deep non-linear fully-connected networks with random but structured weights, whose row-wise covariances are further allowed to depend on the weights of previous layers. For such networks we also derive a closed-form formula for the feature covariance in terms of the weight matrices. We further find that in some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
△ Less
Submitted 10 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Greedy Heuristics and Linear Relaxations for the Random Hitting Set Problem
Authors:
Gabriel Arpino,
Daniil Dmitriev,
Nicolo Grometto
Abstract:
Consider the Hitting Set problem where, for a given universe $\mathcal{X} = \left\{ 1, ... , n \right\}$ and a collection of subsets $\mathcal{S}_1, ... , \mathcal{S}_m$, one seeks to identify the smallest subset of $\mathcal{X}$ which has nonempty intersection with every element in the collection. We study a probabilistic formulation of this problem, where the underlying subsets are formed by inc…
▽ More
Consider the Hitting Set problem where, for a given universe $\mathcal{X} = \left\{ 1, ... , n \right\}$ and a collection of subsets $\mathcal{S}_1, ... , \mathcal{S}_m$, one seeks to identify the smallest subset of $\mathcal{X}$ which has nonempty intersection with every element in the collection. We study a probabilistic formulation of this problem, where the underlying subsets are formed by including each element of the universe with probability $p$, independently of one another. For large enough values of $n$, we rigorously analyse the average case performance of Lovász's celebrated greedy algorithm (Lovász, 1975) with respect to the chosen input distribution. In addition, we study integrality gaps between linear programming and integer programming solutions of the problem.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Deterministic equivalent and error universality of deep random features learning
Authors:
Dominik Schröder,
Hugo Cui,
Daniil Dmitriev,
Bruno Loureiro
Abstract:
This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the lear…
▽ More
This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Dynamic Model Pruning with Feedback
Authors:
Tao Lin,
Sebastian U. Stich,
Luis Barba,
Daniil Dmitriev,
Martin Jaggi
Abstract:
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedb…
▽ More
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Efficient randomized algorithms for PageRank problem
Authors:
Alexander Gasnikov,
Denis Dmitriev
Abstract:
In the paper we compare well known numerical methods of finding PageRank vector. We propose Markov Chain Monte Carlo method and obtain a new estimation for this method. We also propose a new method for PageRank problem based on the reduction of this problem to the matrix game. We solve this (sparse) matrix game with randomized mirror descent. It should be mentioned that we used non-standard random…
▽ More
In the paper we compare well known numerical methods of finding PageRank vector. We propose Markov Chain Monte Carlo method and obtain a new estimation for this method. We also propose a new method for PageRank problem based on the reduction of this problem to the matrix game. We solve this (sparse) matrix game with randomized mirror descent. It should be mentioned that we used non-standard randomization (in KL-projection) goes back to Grigoriadis-Khachiayn (1995).
△ Less
Submitted 26 May, 2016; v1 submitted 12 October, 2014;
originally announced October 2014.