-
A non-monotone trust-region method with noisy oracles and additional sampling
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Angeles Martinez,
Mahsa Yousefi
Abstract:
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradie…
▽ More
In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.
△ Less
Submitted 17 January, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums
Authors:
Stefania Bellavia,
Nataša Krejić,
Nataša Krklec Jerinkić,
Marcos Raydan
Abstract:
The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum,…
▽ More
The spectral gradient method is known to be a powerful low-cost tool for solving large-scale optimization problems. In this paper, our goal is to exploit its advantages in the stochastic optimization framework, especially in the case of mini-batch subsampling that is often used in big data settings. To allow the spectral coefficient to properly explore the underlying approximate Hessian spectrum, we keep the same subsample for several iterations before subsampling again. We analyze the required algorithmic features and the conditions for almost sure convergence, and present initial numerical results that show the advantages of the proposed method.
△ Less
Submitted 19 February, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
AN-SPS: Adaptive Sample Size Nonmonotone Line Search Spectral Projected Subgradient Method for Convex Constrained Optimization Problems
Authors:
Nataša Krklec Jerinkić,
Tijana Ostojić
Abstract:
We consider convex optimization problems with a possibly nonsmooth objective function in the form of a mathematical expectation. The proposed framework (AN-SPS) employs Sample Average Approximations (SAA) to approximate the objective function, which is either unavailable or too costly to compute. The sample size is chosen in an adaptive manner, which eventually pushes the SAA error to zero almost…
▽ More
We consider convex optimization problems with a possibly nonsmooth objective function in the form of a mathematical expectation. The proposed framework (AN-SPS) employs Sample Average Approximations (SAA) to approximate the objective function, which is either unavailable or too costly to compute. The sample size is chosen in an adaptive manner, which eventually pushes the SAA error to zero almost surely (a.s.). The search direction is based on a scaled subgradient and a spectral coefficient, both related to the SAA function. The step size is obtained via a nonmonotone line search over a predefined interval, which yields a theoretically sound and practically efficient algorithm. The method retains feasibility by projecting the resulting points onto a feasible set. The a.s. convergence of AN-SPS method is proved without the assumption of a bounded feasible set or bounded iterates. Preliminary numerical results on Hinge loss problems reveal the advantages of the proposed adaptive scheme. In addition, a study of different nonmonotone line search strategies in combination with different spectral coefficients within AN-SPS framework is also conducted, yielding some hints for future work.
△ Less
Submitted 18 October, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
A Hessian inversion-free exact second order method for distributed consensus optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of…
▽ More
We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of local cost Hessians. However, existing methods involve explicit calculation of local Hessian inverses at each iteration that may be very costly when the dimension of the optimization variable is large. In this paper we develop a novel method termed INDO Inexact Newton method for Distributed Optimization that alleviates the need for Hessian inverse calculation. INDO follows the PMM framework but unlike existing work approximates the Newton direction through a generic fixed point method, e.g., Jacobi Overrelaxation, that does not involve Hessian inverses. We prove exact global linear convergence of INDO and provide analytical studies on how the degree of inexactness in the Newton direction calculation affects the overall methods convergence factor. Numerical experiments on several real data sets demonstrate that INDOs speed is on par or better as state of the art methods iterationwise hence having a comparable communication cost. At the same time, for sufficiently large optimization problem dimensions n (even at n on the order of couple of hundreds), INDO achieves savings in computational cost by at least an order of magnitude.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Group pattern detection of longitudinal data using functional statistics
Authors:
Rongjiao Ji,
Alessandra Micheletti,
Nataša Krklec Jerinkić,
Zoranka Desnica
Abstract:
Estimations and evaluations of the main patterns of time series data in groups benefit large amounts of applications in various fields. Different from the classical auto-correlation time series analysis and the modern neural networks techniques, in this paper we propose a combination of functional analysis of variance (FANOVA) and permutation tests in a more intuitive manner for a limited sample s…
▽ More
Estimations and evaluations of the main patterns of time series data in groups benefit large amounts of applications in various fields. Different from the classical auto-correlation time series analysis and the modern neural networks techniques, in this paper we propose a combination of functional analysis of variance (FANOVA) and permutation tests in a more intuitive manner for a limited sample size. First, FANOVA is applied in order to separate the common information and to dig out the additional categorical influence through paired group comparison, the results of which are secondly analyzed through permutation tests to identify the time zones where the means of the different groups differ significantly. Normalized kernel functions of different groups are able to reflect remarkable mean characteristics in grouped unities, also meaningful for deeper interpretation and group-wise classification. In order to learn whether and when the proposed method of FANOVA and permutation F-test works precisely and efficiently, we compare the estimated kernel results with the ground truth on simulated data. After the confirmation of the model's efficiency from simulation, we apply it also to the RAVDESS facial dataset to extract the emotional behaviors of humans based on facial muscles contractions (so-called action units (AU) technically in computer graphics), by comparing the neutral performances with emotional ones.
△ Less
Submitted 27 March, 2022;
originally announced March 2022.
-
Spectral Projected Subgradient Method for Nonsmooth Convex Optimization Problems
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Tijana Ostojic
Abstract:
We consider constrained optimization problems with a nonsmooth objective function in the form of mathematical expectation. The Sample Average Approximation (SAA) is used to estimate the objective function and variable sample size strategy is employed. The proposed algorithm combines an SAA subgradient with the spectral coefficient in order to provide a suitable direction which improves the perform…
▽ More
We consider constrained optimization problems with a nonsmooth objective function in the form of mathematical expectation. The Sample Average Approximation (SAA) is used to estimate the objective function and variable sample size strategy is employed. The proposed algorithm combines an SAA subgradient with the spectral coefficient in order to provide a suitable direction which improves the performance of the first order method as shown by numerical results. The step sizes are chosen from the predefined interval and the almost sure convergence of the method is proved under the standard assumptions in stochastic environment. To enhance the performance of the proposed algorithm, we further specify the choice of the step size by introducing an Armijo-like procedure adapted to this framework. Considering the computational cost on machine learning problems, we conclude that the line search improves the performance significantly. Numerical experiments conducted on finite sums problems also reveal that the variable sample strategy outperforms the full sample approach.
△ Less
Submitted 8 August, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
An inexact restoration-nonsmooth algorithm with variable accuracy for stochastic nonsmooth convex optimization problems in machine learning and stochastic linear complementarity problems
Authors:
Natasa Krejic,
Natasa Krklec Jerinkic,
Tijana Ostojic
Abstract:
We study unconstrained optimization problems with nonsmooth and convex objective function in the form of a mathematical expectation. The proposed method approximates the expected objective function with a sample average function using Inexact Restoration-based adapted sample sizes. The sample size is chosen in an adaptive manner based on Inexact Restoration. The algorithm uses line search and assu…
▽ More
We study unconstrained optimization problems with nonsmooth and convex objective function in the form of a mathematical expectation. The proposed method approximates the expected objective function with a sample average function using Inexact Restoration-based adapted sample sizes. The sample size is chosen in an adaptive manner based on Inexact Restoration. The algorithm uses line search and assumes descent directions with respect to the current approximate function. We prove the a.s. convergence under standard assumptions. Numerical results for two types of problems, machine learning loss function for training classifiers and stochastic linear complementarity problems, prove the efficiency of the proposed scheme.
△ Less
Submitted 2 November, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Emotion pattern detection on facial videos using functional statistics
Authors:
Rongjiao Ji,
Alessandra Micheletti,
Natasa Krklec Jerinkic,
Zoranka Desnica
Abstract:
There is an increasing scientific interest in automatically analysing and understanding human behavior, with particular reference to the evolution of facial expressions and the recognition of the corresponding emotions. In this paper we propose a technique based on Functional ANOVA to extract significant patterns of face muscles movements, in order to identify the emotions expressed by actors in r…
▽ More
There is an increasing scientific interest in automatically analysing and understanding human behavior, with particular reference to the evolution of facial expressions and the recognition of the corresponding emotions. In this paper we propose a technique based on Functional ANOVA to extract significant patterns of face muscles movements, in order to identify the emotions expressed by actors in recorded videos. We determine if there are time-related differences on expressions among emotional groups by using a functional F-test. Such results are the first step towards the construction of a reliable automatic emotion recognition system
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
EFIX: Exact Fixed Point Methods for Distributed Optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider strongly convex distributed consensus optimization over connected networks. EFIX, the proposed method, is derived using quadratic penalty approach. In more detail, we use the standard reformulation { transforming the original problem into a constrained problem in a higher dimensional space { to define a sequence of suitable quadratic penalty subproblems with increasing penalty paramete…
▽ More
We consider strongly convex distributed consensus optimization over connected networks. EFIX, the proposed method, is derived using quadratic penalty approach. In more detail, we use the standard reformulation { transforming the original problem into a constrained problem in a higher dimensional space { to define a sequence of suitable quadratic penalty subproblems with increasing penalty parameters. For quadratic objectives, the corresponding sequence consists of quadratic penalty subproblems. For the generic strongly convex case, the objective function is approximated with a quadratic model and hence the sequence of the resulting penalty subproblems is again quadratic. EFIX is then derived by solving each of the quadratic penalty subproblems via a fixed point (R)-linear solver, e.g., Jacobi Over-Relaxation method. The exact convergence is proved as well as the worst case complexity of order O(epsilon^-1) for the quadratic case. In the case of strongly convex generic functions, a standard result for penalty methods is obtained. Numerical results indicate that the method is highly competitive with state-of-the-art exact first order methods, requires smaller computational and communication effort, and is robust to the choice of algorithm parameters.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums
Authors:
Daniela di Serafino,
Nataša Krejić,
Nataša Krklec Jerinkić,
Marco Viola
Abstract:
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allo…
▽ More
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not make any convexity assumptions, but require the terms of the sum to be continuously differentiable and have Lipschitz-continuous gradients. The methods fitting into this framework combine line searches and suitably decaying step lengths. A key issue is a two-step sampling at each iteration, which allows us to control the error present in the line-search procedure. Stationarity of limit points is proved in the almost-sure sense, while almost-sure convergence of the sequence of approximations to the solution holds with the additional hypothesis that the functions are strongly convex. Numerical experiments, including comparisons with state-of-the art stochastic optimization methods, show the efficiency of our approach.
△ Less
Submitted 27 June, 2022; v1 submitted 31 July, 2020;
originally announced July 2020.
-
Distributed Fixed Point Method for Solving Systems of Linear Algebraic Equations
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic,
Greta Malaspina,
Alessandra Micheletti
Abstract:
We present a class of iterative fully distributed fixed point methods to solve a system of linear equations, such that each agent in the network holds one of the equations of the system. Under a generic directed, strongly connected network, we prove a convergence result analogous to the one for fixed point methods in the classical, centralized, framework: the proposed method converges to the solut…
▽ More
We present a class of iterative fully distributed fixed point methods to solve a system of linear equations, such that each agent in the network holds one of the equations of the system. Under a generic directed, strongly connected network, we prove a convergence result analogous to the one for fixed point methods in the classical, centralized, framework: the proposed method converges to the solution of the system of linear equations at a linear rate. We further explicitly quantify the rate in terms of the linear system and the network parameters. Next, we show that the algorithm provably works under time-varying directed networks provided that the underlying graph is connected over bounded iteration intervals, and we establish a linear convergence rate for this setting as well. A set of numerical results is presented, demonstrating practical benefits of the method over existing alternatives.
△ Less
Submitted 12 January, 2020;
originally announced January 2020.
-
Exact Spectral-Like Gradient Method for Distributed Optimization
Authors:
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where $n$ nodes…
▽ More
Since the initial proposal in the late 80s, spectral gradient methods continue to receive significant attention, especially due to their excellent numerical performance on various large scale applications. However, to date, they have not been sufficiently explored in the context of distributed optimization. In this paper, we consider unconstrained distributed optimization problems where $n$ nodes constitute an arbitrary connected network and collaboratively minimize the sum of their local convex cost functions. In this setting, building from existing exact distributed gradient methods, we propose a novel exact distributed gradient method wherein nodes' step-sizes are designed according to the novel rules akin to those in spectral gradient methods. We refer to the proposed method as Distributed Spectral Gradient method (DSG).
The method exhibits R-linear convergence under standard assumptions for the nodes' local costs and safeguarding on the algorithm step-sizes. We illustrate the method's performance through simulation examples.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Subsampled Nonmonotone Spectral Gradient Methods
Authors:
Stefania Bellavia,
Nataša Krklec Jerinkić,
Greta Malaspina
Abstract:
This paper deals with subsampled spectral gradient methods for minimizing finite sum. Subsample function and gradient approximations are employed in order to reduce the overall computational cost of the classical spectral gradient methods. The global convergence is enforced by a nonmonotone line search procedure. Global convergence is proved when functions and gradients are approximated with incre…
▽ More
This paper deals with subsampled spectral gradient methods for minimizing finite sum. Subsample function and gradient approximations are employed in order to reduce the overall computational cost of the classical spectral gradient methods. The global convergence is enforced by a nonmonotone line search procedure. Global convergence is proved when functions and gradients are approximated with increasing accuracy. R-linear convergence and worst-case iteration complexity is investigated in case of strongly convex objective function. Numerical results on well known binary classification problems are given to show the effectiveness of this framework and analyze the effect of different spectral coefficient approximations arising from variable sample nature of this procedure.
△ Less
Submitted 1 November, 2019; v1 submitted 17 December, 2018;
originally announced December 2018.
-
Subsampled Inexact Newton methods for minimizing large sums of convex functions
Authors:
Stefania Bellavia,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
This paper deals with the minimization of large sum of convex functions by Inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The Conjugate Gradient method is used to compute the inexact Newton step and global convergence is enforced by a nonmonotone line search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assumi…
▽ More
This paper deals with the minimization of large sum of convex functions by Inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The Conjugate Gradient method is used to compute the inexact Newton step and global convergence is enforced by a nonmonotone line search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assuming strongly convex functions, R-linear convergence and worst-case iteration complexity of the procedure are investigated when functions and gradients are approximated with increasing accuracy. A set of rules for the forcing parameters and subsample Hessian sizes are derived that ensure local q-linear/superlinear convergence of the proposed method.
The random choice of the Hessian subsample is also considered and convergence in the mean square, both for finite and infinite sums of functions, is proved. Finally, global convergence with asymptotic R-linear rate of IN methods is extended to the case of sum of convex function and strongly convex objective function. Numerical results on well known binary classification problems are also given. Adaptive strategies for selecting forcing terms and Hessian subsample size, streaming out of the theoretical analysis, are employed and the numerical results showed that they yield effective IN methods.
△ Less
Submitted 14 November, 2018;
originally announced November 2018.
-
Distributed second order methods with increasing number of working nodes
Authors:
Natasa Krklec Jerinkic,
Dusan Jakovetic,
Natasa Krejic,
Dragana Bajovic
Abstract:
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id…
▽ More
Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays idle with probability $1-p_k$, while the activations are independent both across nodes and across iterations. In this paper, we demonstrate that the idling mechanism can be successfully incorporated in \emph{distributed second order methods} also. Specifically, we apply the idling mechanism to the recently proposed Distributed Quasi Newton method (DQN). We first show theoretically that, when $p_k$ grows to one across iterations in a controlled manner, DQN with idling exhibits very similar theoretical convergence and convergence rates properties as the standard DQN method, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Simulation examples confirm the benefits of incorporating the idling mechanism, demonstrate the method's flexibility with respect to the choice of the $p_k$'s, and compare the proposed idling method with related algorithms from the literature.
△ Less
Submitted 20 September, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Newton-like method with diagonal correction for distributed optimization
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Natasa Krejic,
Natasa Krklec Jerinkic
Abstract:
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di…
▽ More
We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -- dubbed DQN-0, DQN-1 and DQN-2 -- which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods.
△ Less
Submitted 20 February, 2017; v1 submitted 5 September, 2015;
originally announced September 2015.