-
Tracking solutions of time-varying variational inequalities
Authors:
Hédi Hadiji,
Sarah Sachs,
Cristóbal Guzmán
Abstract:
Tracking the solution of time-varying variational inequalities is an important problem with applications in game theory, optimization, and machine learning. Existing work considers time-varying games or time-varying optimization problems. For strongly convex optimization problems or strongly monotone games, these results provide tracking guarantees under the assumption that the variation of the ti…
▽ More
Tracking the solution of time-varying variational inequalities is an important problem with applications in game theory, optimization, and machine learning. Existing work considers time-varying games or time-varying optimization problems. For strongly convex optimization problems or strongly monotone games, these results provide tracking guarantees under the assumption that the variation of the time-varying problem is restrained, that is, problems with a sublinear solution path. In this work we extend existing results in two ways: In our first result, we provide tracking bounds for (1) variational inequalities with a sublinear solution path but not necessarily monotone functions, and (2) for periodic time-varying variational inequalities that do not necessarily have a sublinear solution path-length. Our second main contribution is an extensive study of the convergence behavior and trajectory of discrete dynamical systems of periodic time-varying VI. We show that these systems can exhibit provably chaotic behavior or can converge to the solution. Finally, we illustrate our theoretical results with experiments.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Differentially Private Optimization with Sparse Gradients
Authors:
Badih Ghazi,
Cristóbal Guzmán,
Pritish Kamath,
Ravi Kumar,
Pasin Manurangsi
Abstract:
Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms wit…
▽ More
Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Finally, we study the approximation of stationary points for the empirical loss in approximate-DP optimization and obtain rates that depend on sparsity instead of dimension, modulo polylogarithmic factors.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective
Authors:
Jelena Diakonikolas,
Cristóbal Guzmán
Abstract:
We initiate the study of nonsmooth optimization problems under bounded local subgradient variation, which postulates bounded difference between (sub)gradients in small local regions around points, in either average or maximum sense. The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in optimization, which are defined based on either Lip…
▽ More
We initiate the study of nonsmooth optimization problems under bounded local subgradient variation, which postulates bounded difference between (sub)gradients in small local regions around points, in either average or maximum sense. The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in optimization, which are defined based on either Lipschitz continuity of the objective or Hölder/Lipschitz continuity of its gradient. Further, the defined class contains functions that are neither Lipschitz continuous nor have a Hölder continuous gradient. When restricted to the traditional classes of optimization problems, the parameters defining the studied classes lead to more fine-grained complexity bounds, recovering traditional oracle complexity bounds in the worst case but generally leading to lower oracle complexity for functions that are not ``worst case.'' Some highlights of our results are that: (i) it is possible to obtain complexity results for both convex and nonconvex problems with the (local or global) Lipschitz constant being replaced by a constant of local subgradient variation and (ii) mean width of the subdifferential set around the optima plays a role in the complexity of nonsmooth optimization, particularly in parallel settings. A consequence of (ii) is that for any error parameter $ε> 0$, parallel oracle complexity of nonsmooth Lipschitz convex optimization is lower than its sequential oracle complexity by a factor $\tildeΩ\big(\frac{1}ε\big)$ whenever the objective function is piecewise linear with polynomially many pieces in the input size. This is particularly surprising as existing parallel complexity lower bounds are based on such classes of functions. The seeming contradiction is resolved by considering the region in which the algorithm is allowed to query the objective.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Public-data Assisted Private Stochastic Optimization: Power and Limitations
Authors:
Enayat Ullah,
Michael Menart,
Raef Bassily,
Cristóbal Guzmán,
Raman Arora
Abstract:
We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic convex optimization (SCO) with either labeled or unlabeled public data. For complete/labeled public data, we show that any $(ε,δ)$-PA-DP has excess risk…
▽ More
We study the limits and capability of public-data assisted differentially private (PA-DP) algorithms. Specifically, we focus on the problem of stochastic convex optimization (SCO) with either labeled or unlabeled public data. For complete/labeled public data, we show that any $(ε,δ)$-PA-DP has excess risk $\tildeΩ\big(\min\big\{\frac{1}{\sqrt{n_{\text{pub}}}},\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{nε} \big\} \big)$, where $d$ is the dimension, ${n_{\text{pub}}}$ is the number of public samples, ${n_{\text{priv}}}$ is the number of private samples, and $n={n_{\text{pub}}}+{n_{\text{priv}}}$. These lower bounds are established via our new lower bounds for PA-DP mean estimation, which are of a similar form. Up to constant factors, these lower bounds show that the simple strategy of either treating all data as private or discarding the private data, is optimal. We also study PA-DP supervised learning with \textit{unlabeled} public samples. In contrast to our previous result, we here show novel methods for leveraging public data in private supervised learning. For generalized linear models (GLM) with unlabeled public data, we show an efficient algorithm which, given $\tilde{O}({n_{\text{priv}}}ε)$ unlabeled public samples, achieves the dimension independent rate $\tilde{O}\big(\frac{1}{\sqrt{n_{\text{priv}}}} + \frac{1}{\sqrt{n_{\text{priv}}ε}}\big)$. We develop new lower bounds for this setting which shows that this rate cannot be improved with more public samples, and any fewer public samples leads to a worse rate. Finally, we provide extensions of this result to general hypothesis classes with finite fat-shattering dimension with applications to neural networks and non-Euclidean geometries.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems
Authors:
Tomás González,
Cristóbal Guzmán,
Courtney Paquette
Abstract:
We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the polyhedral setting. We propose $(\varepsilon, δ)$-DP algorithms based on stochastic mirror descent that attain nearly dimension-independent convergence rates for the expected duality gap, a type of guarantee that was known before only for bilinear objectives. For convex-concave and first-order-smoo…
▽ More
We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the polyhedral setting. We propose $(\varepsilon, δ)$-DP algorithms based on stochastic mirror descent that attain nearly dimension-independent convergence rates for the expected duality gap, a type of guarantee that was known before only for bilinear objectives. For convex-concave and first-order-smooth stochastic objectives, our algorithms attain a rate of $\sqrt{\log(d)/n} + (\log(d)^{3/2}/[n\varepsilon])^{1/3}$, where $d$ is the dimension of the problem and $n$ the dataset size. Under an additional second-order-smoothness assumption, we improve the rate on the expected gap to $\sqrt{\log(d)/n} + (\log(d)^{3/2}/[n\varepsilon])^{2/5}$. Under this additional assumption, we also show, by using bias-reduced gradient estimators, that the duality gap is bounded by $\log(d)/\sqrt{n} + \log(d)/[n\varepsilon]^{1/2}$ with constant success probability. This result provides evidence of the near-optimality of the approach. Finally, we show that combining our methods with acceleration techniques from online learning leads to the first algorithm for DP Stochastic Convex Optimization in the polyhedral setting that is not based on Frank-Wolfe methods. For convex and first-order-smooth stochastic objectives, our algorithms attain an excess risk of $\sqrt{\log(d)/n} + \log(d)^{7/10}/[n\varepsilon]^{2/5}$, and when additionally assuming second-order-smoothness, we improve the rate to $\sqrt{\log(d)/n} + \log(d)/\sqrt{n\varepsilon}$. Instrumental to all of these results are various extensions of the classical Maurey Sparsification Lemma, which may be of independent interest.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Multi-level Product Category Prediction through Text Classification
Authors:
Wesley Ferreira Maia,
Angelo Carmignani,
Gabriel Bortoli,
Lucas Maretti,
David Luz,
Daniel Camilo Fuentes Guzman,
Marcos Jardel Henriques,
Francisco Louzada Neto
Abstract:
This article investigates applying advanced machine learning models, specifically LSTM and BERT, for text classification to predict multiple categories in the retail sector. The study demonstrates how applying data augmentation techniques and the focal loss function can significantly enhance accuracy in classifying products into multiple categories using a robust Brazilian retail dataset. The LSTM…
▽ More
This article investigates applying advanced machine learning models, specifically LSTM and BERT, for text classification to predict multiple categories in the retail sector. The study demonstrates how applying data augmentation techniques and the focal loss function can significantly enhance accuracy in classifying products into multiple categories using a robust Brazilian retail dataset. The LSTM model, enriched with Brazilian word embedding, and BERT, known for its effectiveness in understanding complex contexts, were adapted and optimized for this specific task. The results showed that the BERT model, with an F1 Macro Score of up to $99\%$ for segments, $96\%$ for categories and subcategories and $93\%$ for name products, outperformed LSTM in more detailed categories. However, LSTM also achieved high performance, especially after applying data augmentation and focal loss techniques. These results underscore the effectiveness of NLP techniques in retail and highlight the importance of the careful selection of modelling and preprocessing strategies. This work contributes significantly to the field of NLP in retail, providing valuable insights for future research and practical applications.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates
Authors:
Michael Menart,
Enayat Ullah,
Raman Arora,
Raef Bassily,
Cristóbal Guzmán
Abstract:
We study private empirical risk minimization (ERM) problem for losses satisfying the $(γ,κ)$-Kurdyka-Łojasiewicz (KL) condition. The Polyak-Łojasiewicz (PL) condition is a special case of this condition when $κ=2$. Specifically, we study this problem under the constraint of $ρ$ zero-concentrated differential privacy (zCDP). When $κ\in[1,2]$ and the loss function is Lipschitz and smooth over a suff…
▽ More
We study private empirical risk minimization (ERM) problem for losses satisfying the $(γ,κ)$-Kurdyka-Łojasiewicz (KL) condition. The Polyak-Łojasiewicz (PL) condition is a special case of this condition when $κ=2$. Specifically, we study this problem under the constraint of $ρ$ zero-concentrated differential privacy (zCDP). When $κ\in[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrtρ}\big)^κ\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $κ\geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrtρ}\big)^κ\big)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrtρ}\big)^{\frac{2κ}{4-κ}}\big)$ adaptively, which is nearly optimal when $κ= 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $\tilde{O}\big(\frac{\sqrt{d}}{n\sqrtρ}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrtρ}\big)^{1/2}\big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.
△ Less
Submitted 3 April, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese
Authors:
Luiz Giordani,
Gilsiley Darú,
Rhenan Queiroz,
Vitor Buzinaro,
Davi Keglevich Neiva,
Daniel Camilo Fuentes Guzmán,
Marcos Jardel Henriques,
Oilson Alberto Gonzatto Junior,
Francisco Louzada
Abstract:
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF…
▽ More
The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
△ Less
Submitted 20 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Accelerated Rates between Stochastic and Adversarial Online Convex Optimization
Authors:
Sarah Sachs,
Hedi Hadiji,
Tim van Erven,
Cristobal Guzman
Abstract:
Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i…
▽ More
Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing, for example, adversarially poisoned rounds, which were previously considered in the related expert and bandit settings. In the fully i.i.d. case, our regret bounds match the rates one would expect from results in stochastic acceleration, and we also recover the optimal stochastically accelerated rates via online-to-batch conversion. In the fully adversarial case our bounds gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes in terms of the stochastic variance and the adversarial variation of the loss gradients.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap
Authors:
Raef Bassily,
Cristóbal Guzmán,
Michael Menart
Abstract:
We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(ε,δ)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{nε}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based o…
▽ More
We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(ε,δ)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{nε}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2ε^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $\tilde{O}(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $α$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(α+\frac{1}{\sqrt{n}})$. We show that this $α$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $Ω(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $Δ$-stable algorithm has empirical gap $Ω\big(\frac{1}{Δn}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.
△ Less
Submitted 29 June, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Optimal Algorithms for Stochastic Complementary Composite Minimization
Authors:
Alexandre d'Aspremont,
Cristóbal Guzmán,
Clément Lezane
Abstract:
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on c…
▽ More
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on closely related settings, prior to our work no complexity bounds for this problem were known. We close this gap by providing novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems. We conclude by providing numerical results comparing our methods to the state of the art.
△ Less
Submitted 23 January, 2024; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Faster Rates of Convergence to Stationary Points in Differentially Private Optimization
Authors:
Raman Arora,
Raef Bassily,
Tomás González,
Cristóbal Guzmán,
Michael Menart,
Enayat Ullah
Abstract:
We study the problem of approximating stationary points of Lipschitz and smooth functions under $(\varepsilon,δ)$-differential privacy (DP) in both the finite-sum and stochastic settings. A point $\widehat{w}$ is called an $α$-stationary point of a function $F:\mathbb{R}^d\rightarrow\mathbb{R}$ if $\|\nabla F(\widehat{w})\|\leq α$. We provide a new efficient algorithm that finds an…
▽ More
We study the problem of approximating stationary points of Lipschitz and smooth functions under $(\varepsilon,δ)$-differential privacy (DP) in both the finite-sum and stochastic settings. A point $\widehat{w}$ is called an $α$-stationary point of a function $F:\mathbb{R}^d\rightarrow\mathbb{R}$ if $\|\nabla F(\widehat{w})\|\leq α$. We provide a new efficient algorithm that finds an $\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{2/3}\big)$-stationary point in the finite-sum setting, where $n$ is the number of samples. This improves on the previous best rate of $\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$. We also give a new construction that improves over the existing rates in the stochastic optimization setting, where the goal is to find approximate stationary points of the population risk. Our construction finds a $\tilde{O}\big(\frac{1}{n^{1/3}} + \big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$-stationary point of the population risk in time linear in $n$. Furthermore, under the additional assumption of convexity, we completely characterize the sample complexity of finding stationary points of the population risk (up to polylog factors) and show that the optimal rate on population stationarity is $\tilde Θ\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\varepsilon}\big)$. Finally, we show that our methods can be used to provide dimension-independent rates of $O\big(\frac{1}{\sqrt{n}}+\min\big(\big[\frac{\sqrt{rank}}{n\varepsilon}\big]^{2/3},\frac{1}{(n\varepsilon)^{2/5}}\big)\big)$ on population stationarity for Generalized Linear Models (GLM), where $rank$ is the rank of the design matrix, which improves upon the previous best known rate.
△ Less
Submitted 30 May, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Differentially Private Generalized Linear Models Revisited
Authors:
Raman Arora,
Raef Bassily,
Cristóbal Guzmán,
Michael Menart,
Enayat Ullah
Abstract:
We study the problem of $(ε,δ)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of…
▽ More
We study the problem of $(ε,δ)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^* \Vert^2}{(nε)^{2/3}},\frac{\sqrt{d}\Vert w^*\Vert^2}{nε}\right\}\right)$, where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^*$ is the minimizer of the population risk. Apart from the dependence on $\Vert w^\ast\Vert$, our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tildeΩ\left(\frac{1}{\sqrt{n}} + {\min\left\{\frac{\Vert w^*\Vert^{4/3}}{(nε)^{2/3}}, \frac{\sqrt{d}\Vert w^*\Vert}{nε}\right\}}\right)$. We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $Θ\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^*\Vert}{\sqrt{nε}},\frac{\sqrt{\text{rank}}\Vert w^*\Vert}{nε}\right\}\right)$, where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\Vert w^*\Vert$.
△ Less
Submitted 6 March, 2024; v1 submitted 6 May, 2022;
originally announced May 2022.
-
Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions
Authors:
Xufeng Cai,
Chaobing Song,
Cristóbal Guzmán,
Jelena Diakonikolas
Abstract:
We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning. We propose novel variants of stochastic Halpern iteration with recursive variance reduction. In the cocoercive -- and more generally Lipschitz-monotone -- setup, our algorithm attains $ε$ norm of the operator with $\mathcal{O}(\frac{1}{ε^3})$…
▽ More
We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning. We propose novel variants of stochastic Halpern iteration with recursive variance reduction. In the cocoercive -- and more generally Lipschitz-monotone -- setup, our algorithm attains $ε$ norm of the operator with $\mathcal{O}(\frac{1}{ε^3})$ stochastic operator evaluations, which significantly improves over state of the art $\mathcal{O}(\frac{1}{ε^4})$ stochastic operator evaluations required for existing monotone inclusion solvers applied to the same problem classes. We further show how to couple one of the proposed variants of stochastic Halpern iteration with a scheduled restart scheme to solve stochastic monotone inclusion problems with ${\mathcal{O}}(\frac{\log(1/ε)}{ε^2})$ stochastic operator evaluations under additional sharpness or strong monotonicity assumptions.
△ Less
Submitted 8 January, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness
Authors:
Sarah Sachs,
Hédi Hadiji,
Tim van Erven,
Cristóbal Guzmán
Abstract:
Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i…
▽ More
Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing, for example, adversarially poisoned rounds, which were previously considered in the expert and bandit setting. Our results extend this to the online convex optimization framework. In the fully i.i.d. case, our bounds match the rates one would expect from results in stochastic acceleration, and in the fully adversarial case they gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes in terms of the stochastic variance and the adversarial variation of the loss gradients.
△ Less
Submitted 8 June, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Chemistry Across Multiple Phases (CAMP) version 1.0: An integrated multi-phase chemistry model
Authors:
Matthew L. Dawson,
Christian Guzman,
Jeffrey H. Curtis,
Mario Acosta,
Shupeng Zhu,
Donald Dabdub,
Andrew Conley,
Matthew West,
Nicole Riemer,
Oriol Jorba
Abstract:
A flexible treatment for gas- and aerosol-phase chemical processes has been developed for models of diverse scale, from box models up to global models. At the core of this novel framework is an "abstracted aerosol representation" that allows a given chemical mechanism to be solved in atmospheric models with different aerosol representations (e.g., sectional, modal, or particle-resolved). This is a…
▽ More
A flexible treatment for gas- and aerosol-phase chemical processes has been developed for models of diverse scale, from box models up to global models. At the core of this novel framework is an "abstracted aerosol representation" that allows a given chemical mechanism to be solved in atmospheric models with different aerosol representations (e.g., sectional, modal, or particle-resolved). This is accomplished by treating aerosols as a collection of condensed phases that are implemented according to the aerosol representation of the host model. The framework also allows multiple chemical processes (e.g., gas- and aerosol-phase chemical reactions, emissions, deposition, photolysis, and mass-transfer) to be solved simultaneously as a single system. The flexibility of the model is achieved by (1) using an object-oriented design that facilitates extensibility to new types of chemical processes and to new ways of representing aerosol systems; (2) runtime model configuration using JSON input files that permits making changes to any part of the chemical mechanism without recompiling the model; this widely used, human-readable format allows entire gas- and aerosol-phase chemical mechanisms to be described with as much complexity as necessary; and (3) automated comprehensive testing that ensures stability of the code as new functionality is introduced. Together, these design choices enable users to build a customized multiphase mechanism, without having to handle pre-processors, solvers or compilers. This new treatment compiles as a stand-alone library and has been deployed in the particle-resolved PartMC model and in the MONARCH chemical weather prediction system for use at regional and global scales. Results from the initial deployment will be discussed, along with future extension to more complex gas-aerosol systems, and the integration of GPU-based solvers.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings
Authors:
Raef Bassily,
Cristóbal Guzmán,
Michael Menart
Abstract:
We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algori…
▽ More
We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algorithm for the $\ell_1$ setting has nearly-optimal excess population risk $\tilde{O}\big(\sqrt{\frac{\log{d}}{n\varepsilon}}\big)$, and circumvents the dimension dependent lower bound of \cite{Asi:2021} for general non-smooth convex losses. In the differentially private non-convex setting, we provide several new algorithms for approximating stationary points of the population risk. For the $\ell_1$-case with smooth losses and polyhedral constraint, we provide the first nearly dimension independent rate, $\tilde O\big(\frac{\log^{2/3}{d}}{(n\varepsilon)^{1/3}}\big)$ in linear time. For the constrained $\ell_2$-case with smooth losses, we obtain a linear-time algorithm with rate $\tilde O\big(\frac{1}{n^{1/3}}+\frac{d^{1/5}}{(n\varepsilon)^{2/5}}\big)$. Finally, for the $\ell_2$-case we provide the first method for {\em non-smooth weakly convex} stochastic optimization with rate $\tilde O\big(\frac{1}{n^{1/4}}+\frac{d^{1/6}}{(n\varepsilon)^{1/3}}\big)$ which matches the best existing non-private algorithm when $d= O(\sqrt{n})$. We also extend all our results above for the non-convex $\ell_2$ setting to the $\ell_p$ setting, where $1 < p \leq 2$, with only polylogarithmic (in the dimension) overhead in the rates.
△ Less
Submitted 10 November, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Best-Case Lower Bounds in Online Learning
Authors:
Cristóbal Guzmán,
Nishant A. Mehta,
Ali Mortazavi
Abstract:
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In this work, we initiate the study of best-case lower bounds in online convex optimization, wherein we bound the largest improvement an algorithm can obtain relative to the single best action in hindsight. This problem is motivated by the goal of better understanding the adaptivity of a learning algo…
▽ More
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In this work, we initiate the study of best-case lower bounds in online convex optimization, wherein we bound the largest improvement an algorithm can obtain relative to the single best action in hindsight. This problem is motivated by the goal of better understanding the adaptivity of a learning algorithm. Another motivation comes from fairness: it is known that best-case lower bounds are instrumental in obtaining algorithms for decision-theoretic online learning (DTOL) that satisfy a notion of group fairness. Our contributions are a general method to provide best-case lower bounds in Follow The Regularized Leader (FTRL) algorithms with time-varying regularizers, which we use to show that best-case lower bounds are of the same order as existing upper regret bounds: this includes situations with a fixed learning rate, decreasing learning rates, timeless methods, and adaptive gradient methods. In stark contrast, we show that the linearized version of FTRL can attain negative linear regret. Finally, in DTOL with two experts and binary predictions, we fully characterize the best-case sequences, which provides a finer understanding of the best-case lower bounds.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
An Optimal Algorithm for Strict Circular Seriation
Authors:
Santiago Armstrong,
Cristóbal Guzmán,
Carlos A. Sing-Long
Abstract:
We study the problem of circular seriation, where we are given a matrix of pairwise dissimilarities between $n$ objects, and the goal is to find a {\em circular order} of the objects in a manner that is consistent with their dissimilarity. This problem is a generalization of the classical {\em linear seriation} problem where the goal is to find a {\em linear order}, and for which optimal…
▽ More
We study the problem of circular seriation, where we are given a matrix of pairwise dissimilarities between $n$ objects, and the goal is to find a {\em circular order} of the objects in a manner that is consistent with their dissimilarity. This problem is a generalization of the classical {\em linear seriation} problem where the goal is to find a {\em linear order}, and for which optimal ${\cal O}(n^2)$ algorithms are known. Our contributions can be summarized as follows. First, we introduce {\em circular Robinson matrices} as the natural class of dissimilarity matrices for the circular seriation problem. Second, for the case of {\em strict circular Robinson dissimilarity matrices} we provide an optimal ${\cal O}(n^2)$ algorithm for the circular seriation problem. Finally, we propose a statistical model to analyze the well-posedness of the circular seriation problem for large $n$. In particular, we establish ${\cal O}(\log(n)/n)$ rates on the distance between any circular ordering found by solving the circular seriation problem to the underlying order of the model, in the Kendall-tau metric.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Recovering Barabási-Albert Parameters of Graphs through Disentanglement
Authors:
Cristina Guzman,
Daphna Keidar,
Tristan Meynier,
Andreas Opedal,
Niklas Stoehr
Abstract:
Classical graph modeling approaches such as Erdős Rényi (ER) random graphs or Barabási-Albert (BA) graphs, here referred to as stylized models, aim to reproduce properties of real-world graphs in an interpretable way. While useful, graph generation with stylized models requires domain knowledge and iterative trial and error simulation. Previous work by Stoehr et al. (2019) addresses these issues b…
▽ More
Classical graph modeling approaches such as Erdős Rényi (ER) random graphs or Barabási-Albert (BA) graphs, here referred to as stylized models, aim to reproduce properties of real-world graphs in an interpretable way. While useful, graph generation with stylized models requires domain knowledge and iterative trial and error simulation. Previous work by Stoehr et al. (2019) addresses these issues by learning the generation process from graph data, using a disentanglement-focused deep autoencoding framework, more specifically, a $β$-Variational Autoencoder ($β$-VAE). While they successfully recover the generative parameters of ER graphs through the model's latent variables, their model performs badly on sequentially generated graphs such as BA graphs, due to their oversimplified decoder. We focus on recovering the generative parameters of BA graphs by replacing their $β$-VAE decoder with a sequential one. We first learn the generative BA parameters in a supervised fashion using a Graph Neural Network (GNN) and a Random Forest Regressor, by minimizing the squared loss between the true generative parameters and the latent variables. Next, we train a $β$-VAE model, combining the GNN encoder from the first stage with an LSTM-based decoder with a customized loss.
△ Less
Submitted 4 May, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
Optimal Algorithms for Differentially Private Stochastic Monotone Variational Inequalities and Saddle-Point Problems
Authors:
Digvijay Boob,
Cristóbal Guzmán
Abstract:
In this work, we conduct the first systematic study of stochastic variational inequality (SVI) and stochastic saddle point (SSP) problems under the constraint of differential privacy (DP). We propose two algorithms: Noisy Stochastic Extragradient (NSEG) and Noisy Inexact Stochastic Proximal Point (NISPP). We show that a stochastic approximation variant of these algorithms attains risk bounds vanis…
▽ More
In this work, we conduct the first systematic study of stochastic variational inequality (SVI) and stochastic saddle point (SSP) problems under the constraint of differential privacy (DP). We propose two algorithms: Noisy Stochastic Extragradient (NSEG) and Noisy Inexact Stochastic Proximal Point (NISPP). We show that a stochastic approximation variant of these algorithms attains risk bounds vanishing as a function of the dataset size, with respect to the strong gap function; and a sampling with replacement variant achieves optimal risk bounds with respect to a weak gap function. We also show lower bounds of the same order on weak gap function. Hence, our algorithms are optimal. Key to our analysis is the investigation of algorithmic stability bounds, both of which are new even in the nonprivate case. The dependence of the running time of the sampling with replacement algorithms, with respect to the dataset size $n$, is $n^2$ for NSEG and $\tilde{O}(n^{3/2})$ for NISPP.
△ Less
Submitted 1 April, 2022; v1 submitted 7 April, 2021;
originally announced April 2021.
-
The Complexity of Nonconvex-Strongly-Concave Minimax Optimization
Authors:
Siqi Zhang,
Junchi Yang,
Cristóbal Guzmán,
Negar Kiyavash,
Niao He
Abstract:
This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Ω(\sqrtκΔLε^{-2})$ and $Ω(n+\sqrt{nκ}ΔLε^{-2})$ for the two settings, respectively, where $κ$ is the condition number, $L$ is the smoothness constant, a…
▽ More
This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Ω(\sqrtκΔLε^{-2})$ and $Ω(n+\sqrt{nκ}ΔLε^{-2})$ for the two settings, respectively, where $κ$ is the condition number, $L$ is the smoothness constant, and $Δ$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Non-Euclidean Differentially Private Stochastic Convex Optimization: Optimal Rates in Linear Time
Authors:
Raef Bassily,
Cristóbal Guzmán,
Anupama Nandi
Abstract:
Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of $n$ i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization foc…
▽ More
Differentially private (DP) stochastic convex optimization (SCO) is a fundamental problem, where the goal is to approximately minimize the population risk with respect to a convex loss function, given a dataset of $n$ i.i.d. samples from a distribution, while satisfying differential privacy with respect to the dataset. Most of the existing works in the literature of private convex optimization focus on the Euclidean (i.e., $\ell_2$) setting, where the loss is assumed to be Lipschitz (and possibly smooth) w.r.t. the $\ell_2$ norm over a constraint set with bounded $\ell_2$ diameter. Algorithms based on noisy stochastic gradient descent (SGD) are known to attain the optimal excess risk in this setting.
In this work, we conduct a systematic study of DP-SCO for $\ell_p$-setups under a standard smoothness assumption on the loss. For $1< p\leq 2$, under a standard smoothness assumption, we give a new, linear-time DP-SCO algorithm with optimal excess risk. Previously known constructions with optimal excess risk for $1< p <2$ run in super-linear time in $n$. For $p=1$, we give an algorithm with nearly optimal excess risk. Our result for the $\ell_1$-setup also extends to general polyhedral norms and feasible sets. Moreover, we show that the excess risk bounds resulting from our algorithms for $1\leq p \leq 2$ are attained with high probability. For $2 < p \leq \infty$, we show that existing linear-time constructions for the Euclidean setup attain a nearly optimal excess risk in the low-dimensional regime. As a consequence, we show that such constructions attain a nearly optimal excess risk for $p=\infty$. Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness.
△ Less
Submitted 4 May, 2022; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Complementary Composite Minimization, Small Gradients in General Norms, and Applications
Authors:
Jelena Diakonikolas,
Cristóbal Guzmán
Abstract:
Composite minimization is a powerful framework in large-scale convex optimization, based on decoupling of the objective function into terms with structurally different properties and allowing for more flexible algorithmic design. We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex te…
▽ More
Composite minimization is a powerful framework in large-scale convex optimization, based on decoupling of the objective function into terms with structurally different properties and allowing for more flexible algorithmic design. We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex term. This particular form of decoupling is pervasive in statistics and machine learning, due to its link to regularization. The main contributions of our work are summarized as follows. First, we introduce the problem of complementary composite minimization in general normed spaces; second, we provide a unified accelerated algorithmic framework to address broad classes of complementary composite minimization problems; and third, we prove that the algorithms resulting from our framework are near-optimal in most of the standard optimization settings. Additionally, we show that our algorithmic framework can be used to address the problem of making the gradients small in general normed spaces. As a concrete example, we obtain a nearly-optimal method for the standard $\ell_1$ setup (small gradients in the $\ell_{\infty}$ norm), essentially matching the bound of Nesterov (2012) that was previously known only for the Euclidean setup. Finally, we show that our composite methods are broadly applicable to a number of regression and other classes of optimization problems, where regularization plays a key role. Our methods lead to complexity bounds that are either new or match the best existing ones.
△ Less
Submitted 15 February, 2023; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Authors:
Raef Bassily,
Vitaly Feldman,
Cristóbal Guzmán,
Kunal Talwar
Abstract:
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important prog…
▽ More
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses.
Our work is the first to address uniform stability of SGD on {\em nonsmooth} convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case. In addition, our bounds allow us to derive a new algorithm for differentially private nonsmooth stochastic convex optimization with optimal excess population risk. Our algorithm is simpler and more efficient than the best known algorithm for the nonsmooth case Feldman et al. (2020).
△ Less
Submitted 11 June, 2020;
originally announced June 2020.
-
Lower Bounds for Parallel and Randomized Convex Optimization
Authors:
Jelena Diakonikolas,
Cristóbal Guzmán
Abstract:
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show…
▽ More
We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean ($\ell_2$) and $\ell_\infty$ spaces. Our work provides a more general approach for proving lower bounds in the setting of parallel convex optimization.
△ Less
Submitted 19 June, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
i2kit: A Tool for Immutable Infrastructure Deployments based on Lightweight Virtual Machines specialized to run Containers
Authors:
Pablo Chico de Guzman,
Felipe Gorostiaga,
Cesar Sanchez
Abstract:
Container technologies, like Docker, are becoming increasingly popular. Containers provide exceptional developer experience because containers offer lightweight isolation and ease of software distribution. Containers are also widely used in production environments, where a different set of challenges arise such as security, networking, service discovery and load balancing. Container cluster manage…
▽ More
Container technologies, like Docker, are becoming increasingly popular. Containers provide exceptional developer experience because containers offer lightweight isolation and ease of software distribution. Containers are also widely used in production environments, where a different set of challenges arise such as security, networking, service discovery and load balancing. Container cluster management tools, such as Kubernetes, attempt to solve these problems by introducing a new control layer with the container as the unit of deployment. However, adding a new control layer is an extra configuration step and an additional potential source of runtime errors. The virtual machine technology offered by cloud providers is more mature and proven in terms of security, networking, service discovery and load balancing. However, virtual machines are heavier than containers for local development, are less flexible for resource allocation, and suffer longer boot times. This paper presents an alternative to containers that enjoy the best features of both approaches: (1) the use of mature, proven cloud vendor technology; (2) no need for a new control layer; and (3) as lightweight as containers. Our solution is i2kit, a deployment tool based on the immutable infrastructure pattern, where the virtual machine is the unit of deployment. The i2kit tool accepts a simplified format of Kubernetes Deployment Manifests in order to reuse Kubernetes' most successful principles, but it creates a lightweight virtual machine for each Pod using Linuxkit. Linuxkit alleviates the drawback in size that using virtual machines would otherwise entail, because the footprint of Linuxkit is approximately 60MB. Finally, the attack surface of the system is reduced since Linuxkit only installs the minimum set of OS dependencies to run containers, and different Pods are isolated by hypervisor technology.
△ Less
Submitted 28 February, 2018;
originally announced February 2018.
-
Mechanism Design for Demand Response Programs with financial and non-monetary (social) Incentives
Authors:
Mateo Alejandro Cortés Guzmán,
Eduardo Mojica-Nava
Abstract:
Most demand management approaches with non-mandatory policies assume full users' cooperation, which may not be the case given users' beliefs, needs and preferences. In this paper we propose a mechanism for demand management including incentives both with and without money. The mechanism is validated by means of simulation, modeling the consumers as a finite multiagent system which evolves until a…
▽ More
Most demand management approaches with non-mandatory policies assume full users' cooperation, which may not be the case given users' beliefs, needs and preferences. In this paper we propose a mechanism for demand management including incentives both with and without money. The mechanism is validated by means of simulation, modeling the consumers as a finite multiagent system which evolves until a stable state, and social incentives diffusion using opinion dynamics.
△ Less
Submitted 27 October, 2017; v1 submitted 28 September, 2017;
originally announced September 2017.
-
Operations in the era of large distributed telescopes
Authors:
Yan Guillaume Grange,
Kevin Vinsen,
Juan Carlos Guzman,
José Alfredo Parra,
Jan David Mol,
Rosly Renil,
Christoper Schollar
Abstract:
The previous generation of astronomical instruments tended to consist of single receivers in the focal point of one or more physical reflectors. Because of this, most astronomical data sets were small enough that the raw data could easily be downloaded and processed on a single machine.
In the last decade, several large, complex Radio Astronomy instruments have been built and the SKA is currentl…
▽ More
The previous generation of astronomical instruments tended to consist of single receivers in the focal point of one or more physical reflectors. Because of this, most astronomical data sets were small enough that the raw data could easily be downloaded and processed on a single machine.
In the last decade, several large, complex Radio Astronomy instruments have been built and the SKA is currently being designed. Many of these instruments have been designed by international teams, and, in the case of LOFAR span an area larger than a single country. Such systems are ICT telescopes and consist mainly of complex software. This causes the main operational issues to be related to the ICT systems and not the telescope hardware. However, it is important that the operations of the ICT systems are coordinated with the traditional operational work. Managing the operations of such telescopes therefore requires an approach that significantly differs from classical telescope operations.
The goal of this session is to bring together members of operational teams responsible for such large-scale ICT telescopes. This gathering will be used to exchange experiences and knowledge between those teams. Also, we consider such a meeting as very valuable input for future instrumentation, especially the SKA and its regional centres.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.
-
Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization
Authors:
Vitaly Feldman,
Cristobal Guzman,
Santosh Vempala
Abstract:
Stochastic convex optimization, where the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular fi…
▽ More
Stochastic convex optimization, where the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest we derive nearly matching upper and lower bounds on the estimation (sample) complexity including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy and proving concrete lower bounds on the power of convex optimization based methods.
The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in $\mathbb{R}^d$. This natural problem has not been previously studied and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces.
△ Less
Submitted 21 November, 2016; v1 submitted 30 December, 2015;
originally announced December 2015.
-
Lower Bounds on the Oracle Complexity of Nonsmooth Convex Optimization via Information Theory
Authors:
Gábor Braun,
Cristóbal Guzmán,
Sebastian Pokutta
Abstract:
We present an information-theoretic approach to lower bound the oracle complexity of nonsmooth black box convex optimization, unifying previous lower bounding techniques by identifying a combinatorial problem, namely string guessing, as a single source of hardness. As a measure of complexity we use distributional oracle complexity, which subsumes randomized oracle complexity as well as worst-case…
▽ More
We present an information-theoretic approach to lower bound the oracle complexity of nonsmooth black box convex optimization, unifying previous lower bounding techniques by identifying a combinatorial problem, namely string guessing, as a single source of hardness. As a measure of complexity we use distributional oracle complexity, which subsumes randomized oracle complexity as well as worst-case oracle complexity. We obtain strong lower bounds on distributional oracle complexity for the box $[-1,1]^n$, as well as for the $L^p$-ball for $p \geq 1$ (for both low-scale and large-scale regimes), matching worst-case upper bounds, and hence we close the gap between distributional complexity, and in particular, randomized complexity, and worst-case complexity. Furthermore, the bounds remain essentially the same for high-probability and bounded-error oracle complexity, and even for combination of the two, i.e., bounded-error high-probability oracle complexity. This considerably extends the applicability of known bounds.
△ Less
Submitted 7 July, 2023; v1 submitted 19 July, 2014;
originally announced July 2014.
-
On Lower Complexity Bounds for Large-Scale Smooth Convex Optimization
Authors:
Cristobal Guzman,
Arkadi Nemirovski
Abstract:
We derive lower bounds on the black-box oracle complexity of large-scale smooth convex minimization problems, with emphasis on minimizing smooth (with Holder continuous, with a given exponent and constant, gradient) convex functions over high-dimensional ||.||_p-balls, 1<=p<=\infty. Our bounds turn out to be tight (up to logarithmic in the design dimension factors), and can be viewed as a substant…
▽ More
We derive lower bounds on the black-box oracle complexity of large-scale smooth convex minimization problems, with emphasis on minimizing smooth (with Holder continuous, with a given exponent and constant, gradient) convex functions over high-dimensional ||.||_p-balls, 1<=p<=\infty. Our bounds turn out to be tight (up to logarithmic in the design dimension factors), and can be viewed as a substantial extension of the existing lower complexity bounds for large-scale convex minimization covering the nonsmooth case and the 'Euclidean' smooth case (minimization of convex functions with Lipschitz continuous gradients over Euclidean balls). As a byproduct of our results, we demonstrate that the classical Conditional Gradient algorithm is near-optimal, in the sense of Information-Based Complexity Theory, when minimizing smooth convex functions over high-dimensional ||.||_\infty-balls and their matrix analogies -- spectral norm balls in the spaces of square matrices.
△ Less
Submitted 27 November, 2018; v1 submitted 18 July, 2013;
originally announced July 2013.
-
Parallel Backtracking with Answer Memoing for Independent And-Parallelism
Authors:
Pablo Chico de Guzmán,
Amadeo Casas,
Manuel Carro,
Manuel V. Hermenegildo
Abstract:
Goal-level Independent and-parallelism (IAP) is exploited by scheduling for simultaneous execution two or more goals which will not interfere with each other at run time. This can be done safely even if such goals can produce multiple answers. The most successful IAP implementations to date have used recomputation of answers and sequentially ordered backtracking. While in principle simplifying the…
▽ More
Goal-level Independent and-parallelism (IAP) is exploited by scheduling for simultaneous execution two or more goals which will not interfere with each other at run time. This can be done safely even if such goals can produce multiple answers. The most successful IAP implementations to date have used recomputation of answers and sequentially ordered backtracking. While in principle simplifying the implementation, recomputation can be very inefficient if the granularity of the parallel goals is large enough and they produce several answers, while sequentially ordered backtracking limits parallelism. And, despite the expected simplification, the implementation of the classic schemes has proved to involve complex engineering, with the consequent difficulty for system maintenance and extension, while still frequently running into the well-known trapped goal and garbage slot problems. This work presents an alternative parallel backtracking model for IAP and its implementation. The model features parallel out-of-order (i.e., non-chronological) backtracking and relies on answer memoization to reuse and combine answers. We show that this approach can bring significant performance advantages. Also, it can bring some simplification to the important engineering task involved in implementing the backtracking mechanism of previous approaches.
△ Less
Submitted 24 July, 2011;
originally announced July 2011.
-
Network Congestion Control with Markovian Multipath Routing
Authors:
Roberto Cominetti,
Cristobal Guzman
Abstract:
In this paper we consider an integrated model for TCP/IP protocols with multipath routing. The model combines a Network Utility Maximization for rate control based on end-to-end queuing delays, with a Markovian Traffic Equilibrium for routing based on total expected delays. We prove the existence of a unique equilibrium state which is characterized as the solution of an unconstrained strictly conv…
▽ More
In this paper we consider an integrated model for TCP/IP protocols with multipath routing. The model combines a Network Utility Maximization for rate control based on end-to-end queuing delays, with a Markovian Traffic Equilibrium for routing based on total expected delays. We prove the existence of a unique equilibrium state which is characterized as the solution of an unconstrained strictly convex program. A distributed algorithm for solving this optimization problem is proposed, with a brief discussion of how it can be implemented by adapting the current Internet protocols.
△ Less
Submitted 2 January, 2014; v1 submitted 14 July, 2011;
originally announced July 2011.
-
Swap** Evaluation: A Memory-Scalable Solution for Answer-On-Demand Tabling
Authors:
Pablo Chico de Guzman,
Manuel Carro,
David S. Warren
Abstract:
One of the differences among the various approaches to suspension-based tabled evaluation is the scheduling strategy. The two most popular strategies are local and batched evaluation.
The former collects all the solutions to a tabled predicate before making any one of them available outside the tabled computation. The latter returns answers one by one before computing them all, which in principl…
▽ More
One of the differences among the various approaches to suspension-based tabled evaluation is the scheduling strategy. The two most popular strategies are local and batched evaluation.
The former collects all the solutions to a tabled predicate before making any one of them available outside the tabled computation. The latter returns answers one by one before computing them all, which in principle is better if only one answer (or a subset of the answers) is desired.
Batched evaluation is closer to SLD evaluation in that it computes solutions lazily as they are demanded, but it may need arbitrarily more memory than local evaluation, which is able to reclaim memory sooner. Some programs which in practice can be executed under the local strategy quickly run out of memory under batched evaluation. This has led to the general adoption of local evaluation at the expense of the more depth-first batched strategy.
In this paper we study the reasons for the high memory consumption of batched evaluation and propose a new scheduling strategy which we have termed swap** evaluation. Swap** evaluation also returns answers one by one before completing a tabled call, but its memory usage can be orders of magnitude less than batched evaluation. An experimental implementation in the XSB system shows that swap** evaluation is a feasible memory-scalable strategy that need not compromise execution speed.
△ Less
Submitted 22 July, 2010;
originally announced July 2010.
-
A Program Transformation for Continuation Call-Based Tabled Execution
Authors:
Pablo Chico de Guzman,
Manuel Carro,
Manuel V. Hermenegildo
Abstract:
The advantages of tabled evaluation regarding program termination and reduction of complexity are well known --as are the significant implementation, portability, and maintenance efforts that some proposals (especially those based on suspension) require. This implementation effort is reduced by program transformation-based continuation call techniques, at some efficiency cost. However, the tradi…
▽ More
The advantages of tabled evaluation regarding program termination and reduction of complexity are well known --as are the significant implementation, portability, and maintenance efforts that some proposals (especially those based on suspension) require. This implementation effort is reduced by program transformation-based continuation call techniques, at some efficiency cost. However, the traditional formulation of this proposal by Ramesh and Cheng limits the interleaving of tabled and non-tabled predicates and thus cannot be used as-is for arbitrary programs. In this paper we present a complete translation for the continuation call technique which, using the runtime support needed for the traditional proposal, solves these problems and makes it possible to execute arbitrary tabled programs. We present performance results which show that CCall offers a useful tradeoff that can be competitive with state-of-the-art implementations.
△ Less
Submitted 25 January, 2009;
originally announced January 2009.