Search | arXiv e-print repository

A Hessian inversion-free exact second order method for distributed consensus optimization

Authors: Dusan Jakovetic, Natasa Krejic, Natasa Krklec Jerinkic

Abstract: We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of… ▽ More We consider a standard distributed consensus optimization problem where a set of agents connected over an undirected network minimize the sum of their individual local strongly convex costs. Alternating Direction Method of Multipliers ADMM and Proximal Method of Multipliers PMM have been proved to be effective frameworks for design of exact distributed second order methods involving calculation of local cost Hessians. However, existing methods involve explicit calculation of local Hessian inverses at each iteration that may be very costly when the dimension of the optimization variable is large. In this paper we develop a novel method termed INDO Inexact Newton method for Distributed Optimization that alleviates the need for Hessian inverse calculation. INDO follows the PMM framework but unlike existing work approximates the Newton direction through a generic fixed point method, e.g., Jacobi Overrelaxation, that does not involve Hessian inverses. We prove exact global linear convergence of INDO and provide analytical studies on how the degree of inexactness in the Newton direction calculation affects the overall methods convergence factor. Numerical experiments on several real data sets demonstrate that INDOs speed is on par or better as state of the art methods iterationwise hence having a comparable communication cost. At the same time, for sufficiently large optimization problem dimensions n (even at n on the order of couple of hundreds), INDO achieves savings in computational cost by at least an order of magnitude. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Submitted for publication Feb 10, 2022

arXiv:2103.01033 [pdf, other]

Tax Evasion Risk Management Using a Hybrid Unsupervised Outlier Detection Method

Authors: Miloš Savić, Jasna Atanasijević, Dušan Jakovetić, Nataša Krejić

Abstract: Big data methods are becoming an important tool for tax fraud detection around the world. Unsupervised learning approach is the dominant framework due to the lack of label and ground truth in corresponding data sets although these methods suffer from low interpretability. HUNOD, a novel hybrid unsupervised outlier detection method for tax evasion risk management, is presented in this paper. In con… ▽ More Big data methods are becoming an important tool for tax fraud detection around the world. Unsupervised learning approach is the dominant framework due to the lack of label and ground truth in corresponding data sets although these methods suffer from low interpretability. HUNOD, a novel hybrid unsupervised outlier detection method for tax evasion risk management, is presented in this paper. In contrast to previous methods proposed in the literature, the HUNOD method combines two outlier detection approaches based on two different machine learning designs (i.e, clustering and representational learning) to detect and internally validate outliers in a given tax dataset. The HUNOD method allows its users to incorporate relevant domain knowledge into both constituent outlier detection approaches in order to detect outliers relevant for a given economic context. The interpretability of obtained outliers is achieved by training explainable-by-design surrogate models over results of unsupervised outlier detection methods. The experimental evaluation of the HUNOD method is conducted on two datasets derived from the database on individual personal income tax declarations collected by the Tax Administration of Serbia. The obtained results show that the method indicates between 90% and 98% internally validated outliers depending on the clustering configuration and employed regularization mechanisms for representational learning. △ Less

Submitted 24 August, 2021; v1 submitted 25 February, 2021; originally announced March 2021.

Comments: Submitted to Elsevier journal for possible publication

arXiv:2009.11397 [pdf, other]

Detection of Iterative Adversarial Attacks via Counter Attack

Authors: Matthias Rottmann, Kira Maag, Mathis Peyron, Natasa Krejic, Hanno Gottschalk

Abstract: Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagn… ▽ More Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work we outline a mathematical proof that the CW attack can be used as a detector itself. That is, under certain assumptions and in the limit of attack iterations this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we experimentally validate this statement and furthermore obtain AUROC values up to 99.73% on CIFAR10 and ImageNet. This is in the upper part of the spectrum of current state-of-the-art detection rates for CW attacks. △ Less

Submitted 23 March, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

MSC Class: 68T45; 62-07

arXiv:1709.01307 [pdf, other]

Distributed second order methods with increasing number of working nodes

Authors: Natasa Krklec Jerinkic, Dusan Jakovetic, Natasa Krejic, Dragana Bajovic

Abstract: Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays id… ▽ More Recently, an idling mechanism has been introduced in the context of distributed \emph{first order} methods for minimization of a sum of nodes' local convex costs over a generic, connected network. With the idling mechanism, each node $i$, at each iteration $k$, is active -- updates its solution estimate and exchanges messages with its network neighborhood -- with probability $p_k$, and it stays idle with probability $1-p_k$, while the activations are independent both across nodes and across iterations. In this paper, we demonstrate that the idling mechanism can be successfully incorporated in \emph{distributed second order methods} also. Specifically, we apply the idling mechanism to the recently proposed Distributed Quasi Newton method (DQN). We first show theoretically that, when $p_k$ grows to one across iterations in a controlled manner, DQN with idling exhibits very similar theoretical convergence and convergence rates properties as the standard DQN method, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Simulation examples confirm the benefits of incorporating the idling mechanism, demonstrate the method's flexibility with respect to the choice of the $p_k$'s, and compare the proposed idling method with related algorithms from the literature. △ Less

Submitted 20 September, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

arXiv:1509.01703 [pdf, other]

Newton-like method with diagonal correction for distributed optimization

Authors: Dragana Bajovic, Dusan Jakovetic, Natasa Krejic, Natasa Krklec Jerinkic

Abstract: We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the di… ▽ More We consider distributed optimization problems where networked nodes cooperatively minimize the sum of their locally known convex costs. A popular class of methods to solve these problems are the distributed gradient methods, which are attractive due to their inexpensive iterations, but have a drawback of slow convergence rates. This motivates the incorporation of second-order information in the distributed methods, but this task is challenging: although the Hessians which arise in the algorithm design respect the sparsity of the network, their inverses are dense, hence rendering distributed implementations difficult. We overcome this challenge and propose a class of distributed Newton-like methods, which we refer to as Distributed Quasi Newton (DQN). The DQN family approximates the Hessian inverse by: 1) splitting the Hessian into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off-diagonal part through a weighted linear function. The approximation is parameterized by the tuning variables which correspond to different splittings of the Hessian and by different weightings of the off-diagonal Hessian part. Specific choices of the tuning variables give rise to different variants of the proposed general DQN method -- dubbed DQN-0, DQN-1 and DQN-2 -- which mutually trade-off communication and computational costs for convergence. Simulations demonstrate the effectiveness of the proposed DQN methods. △ Less

Submitted 20 February, 2017; v1 submitted 5 September, 2015; originally announced September 2015.

Comments: authors' order is alphabetical; last revision of the paper on Feb 7, 2017

arXiv:1504.04049 [pdf, other]

doi 10.1109/TSP.2016.2560133

Distributed Gradient Methods with Variable Number of Working Nodes

Authors: Dusan Jakovetic, Dragana Bajovic, Natasa Krejic, Natasa Krklec-Jerinkic

Abstract: We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by… ▽ More We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient step with respect to its local cost, and performing a projection onto the constraint set; inactive nodes perform no updates. Assuming that nodes' local costs are strongly convex, with Lipschitz continuous gradients, we show that, as long as activation probability $p_k$ grows to one asymptotically, our algorithm converges in the mean square sense (MSS) to the same solution as the standard distributed gradient method, i.e., as if all the nodes were active at all iterations. Moreover, when $p_k$ grows to one linearly, with an appropriately set convergence factor, the algorithm has a linear MSS convergence, with practically the same factor as the standard distributed gradient method. Simulations on both synthetic and real world data sets demonstrate that, when compared with the standard distributed gradient method, the proposed algorithm significantly reduces the overall number of per-node communications and per-node gradient evaluations (computational cost) for the same required accuracy. △ Less

Submitted 10 March, 2016; v1 submitted 15 April, 2015; originally announced April 2015.

Comments: submitted to a journal on April 15, 2015; revised on September 23, 2015, and March 10, 2016

Showing 1–6 of 6 results for author: Krejic, N