Search | arXiv e-print repository

Decentralized Collaborative Learning Framework with External Privacy Leakage Analysis

Authors: Tsuyoshi Idé, Dzung T. Phan, Rudy Raymond

Abstract: This paper presents two methodological advancements in decentralized multi-task learning under privacy constraints, aiming to pave the way for future developments in next-generation Blockchain platforms. First, we expand the existing framework for collaborative dictionary learning (CollabDict), which has previously been limited to Gaussian mixture models, by incorporating deep variational autoenco… ▽ More This paper presents two methodological advancements in decentralized multi-task learning under privacy constraints, aiming to pave the way for future developments in next-generation Blockchain platforms. First, we expand the existing framework for collaborative dictionary learning (CollabDict), which has previously been limited to Gaussian mixture models, by incorporating deep variational autoencoders (VAEs) into the framework, with a particular focus on anomaly detection. We demonstrate that the VAE-based anomaly score function shares the same mathematical structure as the non-deep model, and provide comprehensive qualitative comparison. Second, considering the widespread use of "pre-trained models," we provide a mathematical analysis on data privacy leakage when models trained with CollabDict are shared externally. We show that the CollabDict approach, when applied to Gaussian mixtures, adheres to a Renyi differential privacy criterion. Additionally, we propose a practical metric for monitoring internal privacy breaches during the learning process. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: To appear in Proceeding of 2023 International workshop Blockchain Kaigi (BCK 23), JPS Conference Proceedings, 2024

arXiv:2403.03611 [pdf]

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Abstract: Acoustic recognition is a common task for deep learning in recent researches, with the employment of spectral feature extraction such as Short-time Fourier transform and Wavelet transform. However, not many researches have found that discuss the advantages and drawbacks, as well as performance comparison of them. In this consideration, this paper aims to comparing the attributes of these two trans… ▽ More Acoustic recognition is a common task for deep learning in recent researches, with the employment of spectral feature extraction such as Short-time Fourier transform and Wavelet transform. However, not many researches have found that discuss the advantages and drawbacks, as well as performance comparison of them. In this consideration, this paper aims to comparing the attributes of these two transforms, called spectrogram and scalogram. A Convolutional Neural Networks for acoustic faults recognition is implemented, then the performance of them is recorded for comparison. A latest research on the same audio database is considered for benchmarking to see how good the designed spectrogram and scalogram is. The advantages and limitations of them are also analyzed. By doing so, the results of this paper provide indications for application scenarios of spectrogram and scalogram, as well as potential further research directions. △ Less

Submitted 26 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2208.10671 [pdf, other]

Cardinality-Regularized Hawkes-Granger Model

Authors: Tsuyoshi Idé, Georgios Kollias, Dzung T. Phan, Naoki Abe

Abstract: We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a specific class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts… ▽ More We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a specific class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts. In this paper, we propose a mathematically well-defined sparse causal learning framework based on a cardinality-regularized Hawkes process, which remedies the pathological issues of existing approaches. We leverage the proposed algorithm for the task of instance-wise causal event analysis, where sparsity plays a critical role. We validate the proposed framework with two real use-cases, one from the power grid and the other from the cloud data center management domain. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 17 pages, 9 figures

arXiv:2202.00865 [pdf, other]

StepDIRECT -- A Derivative-Free Optimization Method for Stepwise Functions

Authors: Dzung T. Phan, Hongsheng Liu, Lam M. Nguyen

Abstract: In this paper, we propose the StepDIRECT algorithm for derivative-free optimization (DFO), in which the black-box objective function has a stepwise landscape. Our framework is based on the well-known DIRECT algorithm. By incorporating the local variability to explore the flatness, we provide a new criterion to select the potentially optimal hyper-rectangles. In addition, we introduce a stochastic… ▽ More In this paper, we propose the StepDIRECT algorithm for derivative-free optimization (DFO), in which the black-box objective function has a stepwise landscape. Our framework is based on the well-known DIRECT algorithm. By incorporating the local variability to explore the flatness, we provide a new criterion to select the potentially optimal hyper-rectangles. In addition, we introduce a stochastic local search algorithm performing on potentially optimal hyper-rectangles to improve the solution quality and convergence speed. Global convergence of the StepDIRECT algorithm is provided. Numerical experiments on optimization for random forest models and hyper-parameter tuning are presented to support the efficacy of our algorithm. The proposed StepDIRECT algorithm shows competitive performance results compared with other state-of-the-art baseline DFO methods including the original DIRECT algorithm. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2110.09131 [pdf, other]

Ensembling Graph Predictions for AMR Parsing

Authors: Hoang Thanh Lam, Gabriele Picco, Yufang Hou, Young-Suk Lee, Lam M. Nguyen, Dzung T. Phan, Vanessa López, Ramon Fernandez Astudillo

Abstract: In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions.… ▽ More In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets. △ Less

Submitted 24 January, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: Published at NeurIPS 2021

arXiv:2104.02278 [pdf, other]

A novel activity pattern generation incorporating deep learning for transport demand models

Authors: Danh T. Phan, Hai L. Vu

Abstract: Activity generation plays an important role in activity-based demand modelling systems. While machine learning, especially deep learning, has been increasingly used for mode choice and traffic flow prediction, much less research exploiting the advantage of deep learning for activity generation tasks. This paper proposes a novel activity pattern generation framework by incorporating deep learning w… ▽ More Activity generation plays an important role in activity-based demand modelling systems. While machine learning, especially deep learning, has been increasingly used for mode choice and traffic flow prediction, much less research exploiting the advantage of deep learning for activity generation tasks. This paper proposes a novel activity pattern generation framework by incorporating deep learning with travel domain knowledge. We model each activity schedule as one primary activity tour and several secondary activity tours. We then develop different deep neural networks with entity embedding and random forest models to classify activity type, as well as to predict activity times. The proposed framework can capture the activity patterns for individuals in both training and validation sets. Results show high accuracy for the start time and end time of work and school activities. The framework also replicates the start time patterns of stop-before and stop-after primary work activity well. This provides a promising direction to deploy advanced machine learning methods to generate more reliable activity-travel patterns for transport demand systems and their applications. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 21 pages, 12 figures

arXiv:2103.03452 [pdf, other]

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We develop two new algorithms, called, FedDR and asyncFedDR, for solving a fundamental nonconvex composite optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. They can also handle convex regularizers. Unlike recent methods in the literat… ▽ More We develop two new algorithms, called, FedDR and asyncFedDR, for solving a fundamental nonconvex composite optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. They can also handle convex regularizers. Unlike recent methods in the literature, e.g., FedSplit and FedPD, our algorithms update only a subset of users at each communication round, and possibly in an asynchronous manner, making them more practical. These new algorithms can handle statistical and system heterogeneity, which are the two main challenges in federated learning, while achieving the best known communication complexity. In fact, our new algorithms match the communication complexity lower bound up to a constant factor under standard assumptions. Our numerical experiments illustrate the advantages of our methods over existing algorithms on synthetic and real datasets. △ Less

Submitted 28 October, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 39 pages, and 12 figures

Report number: UNC-STOR-June 2021

Journal ref: NeurIPs 2021

arXiv:2011.03375 [pdf, other]

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Authors: Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R. Kalagnanam

Abstract: Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT… ▽ More Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques. We present results on data-sets containing up to 245,000 samples. Existing MIP-based methods do not scale well on training data-sets beyond 5,500 samples. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2003.00430 [pdf, other]

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Authors: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk, Quoc Tran-Dinh

Abstract: We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algori… ▽ More We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity $\mathcal{O}\left(\varepsilon^{-3}\right)$ to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP $\mathcal{O}\left(\varepsilon^{-4}\right)$ and SVRPG $\mathcal{O}\left(\varepsilon^{-10/3}\right)$ in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems. △ Less

Submitted 21 September, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

Comments: Accepted for publication at the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020)

Journal ref: Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR 108:374-385, 2020

arXiv:2002.08246 [pdf, other]

A Unified Convergence Analysis for Shuffling-Type Gradient Methods

Authors: Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk

Abstract: In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two diff… ▽ More In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants. △ Less

Submitted 19 September, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Journal of Machine Learning Research, 2021

arXiv:1908.00528 [pdf, other]

Neural Simplex Architecture

Authors: Dung T. Phan, Radu Grosu, Nils Jansen, Nicola Paoletti, Scott A. Smolka, Scott D. Stoller

Abstract: We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of autonomous and other complex systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional approach,… ▽ More We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of autonomous and other complex systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional approach, the advanced controller (AC) is treated as a black box; when the decision module switches control to the baseline controller (BC), the BC remains in control forever. There is relatively little work on switching control back to the AC, and there are no techniques for correcting the AC's behavior after it generates a potentially unsafe control input that causes a failover to the BC. Our NSA addresses both of these limitations. NSA not only provides safety assurances in the presence of a possibly unsafe neural controller, but can also improve the safety of such a controller in an online setting via retraining, without overly degrading its performance. To demonstrate NSA's benefits, we have conducted several significant case studies in the continuous control domain. These include a target-seeking ground rover navigating an obstacle field, and a neural controller for an artificial pancreas system. △ Less

Submitted 24 March, 2020; v1 submitted 1 August, 2019; originally announced August 2019.

Comments: 12th NASA Formal Methods Symposium (NFM 2020)

arXiv:1907.03793 [pdf, other]

A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We first introduce our hybrid estimator and then investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply… ▽ More We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We first introduce our hybrid estimator and then investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply our theory to develop several variants of stochastic gradient methods to solve both expectation and finite-sum composite optimization problems. Our first algorithm can be viewed as a variant of proximal stochastic gradient methods with a single-loop, but can achieve $\mathcal{O}(σ^3\varepsilon^{-1} + σ\varepsilon^{-3})$-oracle complexity bound, matching the best-known ones from state-of-the-art double-loop algorithms in the literature, where $σ> 0$ is the variance and $\varepsilon$ is a desired accuracy. Then, we consider two different variants of our method: adaptive step-size and restarting schemes that have similar theoretical guarantees as in our first algorithm. We also study two mini-batch variants of the proposed methods. In all cases, we achieve the best-known complexity bounds under standard assumptions. We test our methods on several numerical examples with real datasets and compare them with state-of-the-arts. Our numerical experiments show that the new methods are comparable and, in many cases, outperform their competitors. △ Less

Submitted 2 May, 2020; v1 submitted 8 July, 2019; originally announced July 2019.

Comments: 49 pages, 2 tables, 9 figures

Report number: UNC-STOR-2019.07.V1-03

arXiv:1905.05920 [pdf, other]

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

Authors: Quoc Tran-Dinh, Nhan H. Pham, Dzung T. Phan, Lam M. Nguyen

Abstract: We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to ha… ▽ More We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to handle a broader class of estimators in both convex and nonconvex settings. We propose a new single-loop stochastic gradient descent algorithm that can achieve $O(\max\{σ^3\varepsilon^{-1},σ\varepsilon^{-3}\})$-complexity bound to obtain an $\varepsilon$-stationary point under smoothness and $σ^2$-bounded variance assumptions. This complexity is better than $O(σ^2\varepsilon^{-4})$ often obtained in state-of-the-art SGDs when $σ< O(\varepsilon^{-3})$. We also consider different extensions of our method, including constant and adaptive step-size with single-loop, double-loop, and mini-batch variants. We compare our algorithms with existing methods on several datasets using two nonconvex models. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: 41 pages and 18 figures

Report number: UNC-STOR-May-2019-03

arXiv:1902.05679 [pdf, other]

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization

Authors: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh

Abstract: We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The… ▽ More We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-of-the-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets. △ Less

Submitted 28 March, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: 45 pages, 8 figures, and 2 table

Report number: STOR-UNC-Feb14.2019

arXiv:1901.07648 [pdf, other]

Finite-Sum Smooth Optimization with SARAH

Authors: Lam M. Nguyen, Marten van Dijk, Dzung T. Phan, Phuong Ha Nguyen, Tsui-Wei Weng, Jayant R. Kalagnanam

Abstract: The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy… ▽ More The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $Ω(\sqrt{n}/ε)$ for $n \leq \mathcal{O}(ε^{-2})$ where $ε$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq ε$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(ε^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance. △ Less

Submitted 22 April, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

arXiv:1901.07634

DTN: A Learning Rate Scheme with Convergence Rate of $\mathcal{O}(1/t)$ for SGD

Authors: Lam M. Nguyen, Phuong Ha Nguyen, Dzung T. Phan, Jayant R. Kalagnanam, Marten van Dijk

Abstract: This paper has some inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series. Precisely, our claims on the convergence rate of $\mathcal{O}(1/t)$ of SGD presented in Theorem 1, Corollary 1, Theorem 2 and Corollary 2 are wrongly derived because they are based on Lemma 5. In Lemma 5, we do not correctly use the test criterion for a… ▽ More This paper has some inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series. Precisely, our claims on the convergence rate of $\mathcal{O}(1/t)$ of SGD presented in Theorem 1, Corollary 1, Theorem 2 and Corollary 2 are wrongly derived because they are based on Lemma 5. In Lemma 5, we do not correctly use the test criterion for a series. Hence, the result of Lemma 5 is not valid. We would like to thank the community for pointing out this mistake! △ Less

Submitted 27 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: This paper has inconsistent results, i.e., we made some failed claims because we did some mistakes for using the test criterion for a series

arXiv:1810.04100 [pdf, other]

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

Authors: Marten van Dijk, Lam M. Nguyen, Phuong Ha Nguyen, Dzung T. Phan

Abstract: We study Stochastic Gradient Descent (SGD) with diminishing step sizes for convex objective functions. We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions. In terms of curvature we can derive a new inequality that can be used to compute an optimal sequence of diminishing step sizes by solving a differentia… ▽ More We study Stochastic Gradient Descent (SGD) with diminishing step sizes for convex objective functions. We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions. In terms of curvature we can derive a new inequality that can be used to compute an optimal sequence of diminishing step sizes by solving a differential equation. Our exact solutions confirm known results in literature and allows us to fully characterize a new regularizer with its corresponding expected convergence rates. △ Less

Submitted 13 May, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, 2019

arXiv:1801.06159 [pdf, other]

When Does Stochastic Gradient Algorithm Work Well?

Authors: Lam M. Nguyen, Nam H. Nguyen, Dzung T. Phan, Jayant R. Kalagnanam, Katya Scheinberg

Abstract: In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood o… ▽ More In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks. △ Less

Submitted 25 December, 2018; v1 submitted 18 January, 2018; originally announced January 2018.

arXiv:1404.4132 [pdf, ps, other]

Projection Algorithms for Non-Convex Minimization with Application to Sparse Principal Component Analysis

Authors: William W. Hager, Dzung T. Phan, Jia-Jie Zhu

Abstract: We consider concave minimization problems over non-convex sets.Optimization problems with this structure arise in sparse principal component analysis. We analyze both a gradient projection algorithm and an approximate Newton algorithm where the Hessian approximation is a multiple of the identity. Convergence results are established. In numerical experiments arising in sparse principal component an… ▽ More We consider concave minimization problems over non-convex sets.Optimization problems with this structure arise in sparse principal component analysis. We analyze both a gradient projection algorithm and an approximate Newton algorithm where the Hessian approximation is a multiple of the identity. Convergence results are established. In numerical experiments arising in sparse principal component analysis, it is seen that the performance of the gradient projection algorithm is very similar to that of the truncated power method and the generalized power method. In some cases, the approximate Newton algorithm with a Barzilai-Borwein (BB) Hessian approximation can be substantially faster than the other algorithms, and can converge to a better solution. △ Less

Submitted 6 April, 2019; v1 submitted 15 April, 2014; originally announced April 2014.

Showing 1–19 of 19 results for author: Phan, D T