-
An autoencoder for compressing angle-resolved photoemission spectroscopy data
Authors:
Steinn Ymir Agustsson,
Mohammad Ahsanul Haque,
Thi Tam Truong,
Marco Bianchi,
Nikita Klyuchnikov,
Davide Mottin,
Panagiotis Karras,
Philip Hofmann
Abstract:
Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains strictly limited, calling for fast, effective, and o…
▽ More
Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains strictly limited, calling for fast, effective, and on-the-fly data analysis tools to exploit this time. In response to this need, we introduce ARPESNet, a versatile autoencoder network that efficiently summmarises and compresses ARPES datasets. We train ARPESNet on a large and varied dataset of 2-dimensional ARPES data extracted by cutting standard 3-dimensional ARPES datasets along random directions in $\mathbf{k}$. To test the data representation capacity of ARPESNet, we compare $k$-means clustering quality between data compressed by ARPESNet, data compressed by discrete cosine transform, and raw data, at different noise levels. ARPESNet data excels in clustering quality despite its high compression ratio.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Backtracking New Q-Newton's method, Newton's flow, Voronoi's diagram and Stochastic root finding
Authors:
John Erik Fornaess,
Mi Hu,
Tuyen Trung Truong,
Takayuki Watanabe
Abstract:
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN.…
▽ More
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - which has strong theoretical guarantee, is easy to implement, and has good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the basins of attractions for finding roots of polynomials and meromorphic functions, with BNQN. In general, they look more smooth than that of Newton's method.
In this paper, we continue to experimentally explore in depth this remarkable phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link poses a couple of challenging puzzles to be explained. Experiments also indicate that BNQN is more robust against random perturbations than Newton's method and Random Relaxed Newton's method.
△ Less
Submitted 8 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Data-driven prediction of tool wear using Bayesian-regularized artificial neural networks
Authors:
Tam T. Truong,
Jay Airao,
Panagiotis Karras,
Faramarz Hojati,
Bahman Azarhoushang,
Ramin Aghababaei
Abstract:
The prediction of tool wear helps minimize costs and enhance product quality in manufacturing. While existing data-driven models using machine learning and deep learning have contributed to the accurate prediction of tool wear, they often lack generality and require substantial training data for high accuracy. In this paper, we propose a new data-driven model that uses Bayesian Regularized Artific…
▽ More
The prediction of tool wear helps minimize costs and enhance product quality in manufacturing. While existing data-driven models using machine learning and deep learning have contributed to the accurate prediction of tool wear, they often lack generality and require substantial training data for high accuracy. In this paper, we propose a new data-driven model that uses Bayesian Regularized Artificial Neural Networks (BRANNs) to precisely predict milling tool wear. BRANNs combine the strengths and leverage the benefits of artificial neural networks (ANNs) and Bayesian regularization, whereby ANNs learn complex patterns and Bayesian regularization handles uncertainty and prevents overfitting, resulting in a more generalized model. We treat both process parameters and monitoring sensor signals as BRANN input parameters. We conducted an extensive experimental study featuring four different experimental data sets, including the NASA Ames milling dataset, the 2010 PHM Data Challenge dataset, the NUAA Ideahouse tool wear dataset, and an in-house performed end-milling of the Ti6Al4V dataset. We inspect the impact of input features, training data size, hidden units, training algorithms, and transfer functions on the performance of the proposed BRANN model and demonstrate that it outperforms existing state-of-the-art models in terms of accuracy and reliability.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Creating walls to avoid unwanted points in root finding and optimization
Authors:
Tuyen Trung Truong
Abstract:
In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $z^*$ (a fact which one may no…
▽ More
In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $z^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm.
Assume that one already has a method IM for optimization (and root finding) for non-constrained optimization. We provide a simple modification IM1 of the method to treat the situation discussed in the previous paragraph. If the method IM has strong theoretical guarantees, then so is IM1. As applications, we prove two theoretical applications: one concerns finding roots of a meromorphic function in an open subset of a Riemann surface, and the other concerns finding local minima of a function in an open subset of a Euclidean space inside it the function has at most countably many critical points.
Along the way, we compare with main existing relevant methods in the current literature. We provide several examples in various different settings to illustrate the usefulness of the new approach.
△ Less
Submitted 10 January, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Deep Learning and Computer Vision Techniques for Microcirculation Analysis: A Review
Authors:
Maged Abdalla Helmy Mohamed Abdou,
Trung Tuyen Truong,
Eric Jul,
Paulo Ferreira
Abstract:
The analysis of microcirculation images has the potential to reveal early signs of life-threatening diseases like sepsis. Quantifying the capillary density and the capillary distribution in microcirculation images can be used as a biological marker to assist critically ill patients. The quantification of these biological markers is labor-intensive, time-consuming, and subject to interobserver vari…
▽ More
The analysis of microcirculation images has the potential to reveal early signs of life-threatening diseases like sepsis. Quantifying the capillary density and the capillary distribution in microcirculation images can be used as a biological marker to assist critically ill patients. The quantification of these biological markers is labor-intensive, time-consuming, and subject to interobserver variability. Several computer vision techniques with varying performance can be used to automate the analysis of these microcirculation images in light of the stated challenges. In this paper, we present a survey of over 50 research papers and present the most relevant and promising computer vision algorithms to automate the analysis of microcirculation images. Furthermore, we present a survey of the methods currently used by other researchers to automate the analysis of microcirculation images. This survey is of high clinical relevance because it acts as a guidebook of techniques for other researchers to develop their microcirculation analysis systems and algorithms.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
CapillaryX: A Software Design Pattern for Analyzing Medical Images in Real-time using Deep Learning
Authors:
Maged Abdalla Helmy Abdou,
Paulo Ferreira,
Eric Jul,
Tuyen Trung Truong
Abstract:
Recent advances in digital imaging, e.g., increased number of pixels captured, have meant that the volume of data to be processed and analyzed from these images has also increased. Deep learning algorithms are state-of-the-art for analyzing such images, given their high accuracy when trained with a large data volume of data. Nevertheless, such analysis requires considerable computational power, ma…
▽ More
Recent advances in digital imaging, e.g., increased number of pixels captured, have meant that the volume of data to be processed and analyzed from these images has also increased. Deep learning algorithms are state-of-the-art for analyzing such images, given their high accuracy when trained with a large data volume of data. Nevertheless, such analysis requires considerable computational power, making such algorithms time- and resource-demanding. Such high demands can be met by using third-party cloud service providers. However, analyzing medical images using such services raises several legal and privacy challenges and does not necessarily provide real-time results. This paper provides a computing architecture that locally and in parallel can analyze medical images in real-time using deep learning thus avoiding the legal and privacy challenges stemming from uploading data to a third-party cloud provider. To make local image processing efficient on modern multi-core processors, we utilize parallel execution to offset the resource-intensive demands of deep neural networks. We focus on a specific medical-industrial case study, namely the quantifying of blood vessels in microcirculation images for which we have developed a working system. It is currently used in an industrial, clinical research setting as part of an e-health application. Our results show that our system is approximately 78% faster than its serial system counterpart and 12% faster than a master-slave parallel system architecture.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Generalisations and improvements of New Q-Newton's method Backtracking
Authors:
Tuyen Trung Truong
Abstract:
In this paper, we propose a general framework for the algorithm New Q-Newton's method Backtracking, developed in the author's previous work. For a symmetric, square real matrix $A$, we define $minsp(A):=\min _{||e||=1} ||Ae||$. Given a $C^2$ cost function $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and a real number $0<τ$, as well as $m+1$ fixed real numbers $δ_0,\ldots ,δ_m$, we define for each…
▽ More
In this paper, we propose a general framework for the algorithm New Q-Newton's method Backtracking, developed in the author's previous work. For a symmetric, square real matrix $A$, we define $minsp(A):=\min _{||e||=1} ||Ae||$. Given a $C^2$ cost function $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and a real number $0<τ$, as well as $m+1$ fixed real numbers $δ_0,\ldots ,δ_m$, we define for each $x\in \mathbb{R}^m$ with $\nabla f(x)\not= 0$ the following quantities:
$κ:=\min _{i\not= j}|δ_i-δ_j|$;
$A(x):=\nabla ^2f(x)+δ||\nabla f(x)||^τId$, where $δ$ is the first element in the sequence $\{δ_0,\ldots ,δ_m\}$ for which $minsp(A(x))\geq κ||\nabla f(x)||^τ$;
$e_1(x),\ldots ,e_m(x)$ are an orthonormal basis of $\mathbb{R}^m$, chosen appropriately;
$w(x)=$ the step direction, given by the formula: $$w(x)=\sum _{i=1}^m\frac{<\nabla f(x),e_i(x)>}{||A(x)e_i(x)||}e_i(x);$$ (we can also normalise by $w(x)/\max \{1,||w(x)||\}$ when needed)
$γ(x)>0$ learning rate chosen by Backtracking line search so that Armijo's condition is satisfied: $$f(x-γ(x)w(x))-f(x)\leq -\frac{1}{3}γ(x)<\nabla f(x),w(x)>.$$
The update rule for our algorithm is $x\mapsto H(x)=x-γ(x)w(x)$.
In New Q-Newton's method Backtracking, the choices are $τ=1+α>1$ and $e_1(x),\ldots ,e_m(x)$'s are eigenvectors of $\nabla ^2f(x)$. In this paper, we allow more flexibility and generality, for example $τ$ can be chosen to be $<1$ or $e_1(x),\ldots ,e_m(x)$'s are not necessarily eigenvectors of $\nabla ^2f(x)$.
New Q-Newton's method Backtracking (as well as Backtracking gradient descent) is a special case, and some versions have flavours of quasi-Newton's methods. Several versions allow good theoretical guarantees. An application to solving systems of polynomial equations is given.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation
Authors:
Tuyen Trung Truong
Abstract:
In a recent joint work, the author has developed a modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newton'…
▽ More
In a recent joint work, the author has developed a modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newton's method such as Adaptive Cubic Regularization and BFGS, as well as first order methods such as Unbounded Two-way Backtracking Gradient Descent.
In this paper, we resolve the convergence guarantee issue by proposing a modification of New Q-Newton's method, named New Q-Newton's method Backtracking, which incorporates a more sophisticated use of hyperparameters and a Backtracking line search. This new method has very good theoretical guarantees, which for a {\bf Morse function} yields the following (which is unknown for New Q-Newton's method):
{\bf Theorem.} Let $f:\mathbb{R}^m\rightarrow \mathbb{R}$ be a Morse function, that is all its critical points have invertible Hessian. Then for a sequence $\{x_n\}$ constructed by New Q-Newton's method Backtracking from a random initial point $x_0$, we have the following two alternatives:
i) $\lim_{n\rightarrow\infty}||x_n||=\infty$,
or
ii) $\{x_n\}$ converges to a point $x_{\infty}$ which is a {\bf local minimum} of $f$, and the rate of convergence is {\bf quadratic}.
Moreover, if $f$ has compact sublevels, then only case ii) happens.
As far as we know, for Morse functions, this is the best theoretical guarantee for iterative optimization algorithms so far in the literature. We have tested in experiments on small scale, with some further simplified versions of New Q-Newton's method Backtracking, and found that the new method significantly improve New Q-Newton's method.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
CapillaryNet: An Automated System to Quantify Skin Capillary Density and Red Blood Cell Velocity from Handheld Vital Microscopy
Authors:
Maged Helmy,
Anastasiya Dykyy,
Tuyen Trung Truong,
Paulo Ferreira,
Eric Jul
Abstract:
Capillaries are the smallest vessels in the body responsible for delivering oxygen and nutrients to surrounding cells. Various life-threatening diseases are known to alter the density of healthy capillaries and the flow velocity of erythrocytes within the capillaries. In previous studies, capillary density and flow velocity were manually assessed by trained specialists. However, manual analysis of…
▽ More
Capillaries are the smallest vessels in the body responsible for delivering oxygen and nutrients to surrounding cells. Various life-threatening diseases are known to alter the density of healthy capillaries and the flow velocity of erythrocytes within the capillaries. In previous studies, capillary density and flow velocity were manually assessed by trained specialists. However, manual analysis of a standard 20-second microvascular video requires 20 minutes on average and necessitates extensive training. Thus, manual analysis has been reported to hinder the application of microvascular microscopy in a clinical environment. To address this problem, this paper presents a fully automated state-of-the-art system to quantify skin nutritive capillary density and red blood cell velocity captured by handheld-based microscopy videos. The proposed method combines the speed of traditional computer vision algorithms with the accuracy of convolutional neural networks to enable clinical capillary analysis. The results show that the proposed system fully automates capillary detection with an accuracy exceeding that of trained analysts and measures several novel microvascular parameters that had eluded quantification thus far, namely, capillary hematocrit and intracapillary flow velocity heterogeneity. The proposed end-to-end system, named CapillaryNet, can detect capillaries at $\sim$0.9 seconds per frame with $\sim$93\% accuracy. The system is currently being used as a clinical research product in a larger e-health application to analyse capillary data captured from patients suffering from COVID-19, pancreatitis, and acute heart diseases. CapillaryNet narrows the gap between the analysis of microcirculation images in a clinical environment and state-of-the-art systems.
△ Less
Submitted 20 January, 2022; v1 submitted 23 April, 2021;
originally announced April 2021.
-
Unconstrained optimisation on Riemannian manifolds
Authors:
Tuyen Trung Truong
Abstract:
In this paper, we give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method to the Riemannian setting.Here are some easy to state consequences of results in this paper, where X is a general Riemannian manifold of finite dimension and $f:X\rightarrow \mathbb{R}$ a $C^2$ function which is Morse (that is, all its critical points are non-degenerate).…
▽ More
In this paper, we give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method to the Riemannian setting.Here are some easy to state consequences of results in this paper, where X is a general Riemannian manifold of finite dimension and $f:X\rightarrow \mathbb{R}$ a $C^2$ function which is Morse (that is, all its critical points are non-degenerate).
{\bf Theorem.} For random choices of the hyperparameters in the Riemanian Local Backtracking Gradient Descent algorithm and for random choices of the initial point $x_0$, the sequence $\{x_n\}$ constructed by the algorithm either (i) converges to a local minimum of $f$ or (ii) eventually leaves every compact subsets of $X$ (in other words, diverges to infinity on $X$). If $f$ has compact sublevels, then only the former alternative happens. The convergence rate is the same as in the classical paper by Armijo.
{\bf Theorem.} Assume that $f$ is $C^3$. For random choices of the hyperparametes in the Riemannian New Q-Newton's method, if the sequence constructed by the algorithm converges, then the limit is a critical point of $f$. We have a local Stable-Center manifold theorem, near saddle points of $f$, for the dynamical system associated to the algorithm. If the limit point is a non-degenerate minimum point, then the rate of convergence is quadratic. If moreover $X$ is an open subset of a Lie group and the initial point $x_0$ is chosen randomly, then we can globally avoid saddle points.
As an application, we propose a general method using Riemannian Backtracking GD to find minimum of a function on a bounded ball in a Euclidean space, and do explicit calculations for calculating the smallest eigenvalue of a symmetric square matrix.
△ Less
Submitted 31 August, 2020; v1 submitted 25 August, 2020;
originally announced August 2020.
-
Asymptotic behaviour of learning rates in Armijo's condition
Authors:
Tuyen Trung Truong,
Tuan Hang Nguyen
Abstract:
Fix a constant $0<α<1$. For a $C^1$ function $f:\mathbb{R}^k\rightarrow \mathbb{R}$, a point $x$ and a positive number $δ>0$, we say that Armijo's condition is satisfied if $f(x-δ\nabla f(x))-f(x)\leq -αδ||\nabla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm.
Consider a sequence $\{x_n\}$ defined by $x_{n+1}=x_n-δ_n\nabla f(x_n)$, for posit…
▽ More
Fix a constant $0<α<1$. For a $C^1$ function $f:\mathbb{R}^k\rightarrow \mathbb{R}$, a point $x$ and a positive number $δ>0$, we say that Armijo's condition is satisfied if $f(x-δ\nabla f(x))-f(x)\leq -αδ||\nabla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm.
Consider a sequence $\{x_n\}$ defined by $x_{n+1}=x_n-δ_n\nabla f(x_n)$, for positive numbers $δ_n$ for which Armijo's condition is satisfied. We show that if $\{x_n\}$ converges to a non-degenerate critical point, then $\{δ_n\}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $\nabla ^2f$ and its inverse at the limit point. This complements the first author's results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen.
In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadelta's paper). The main point is that since learning rate in Backtracking GD is bound by Armijo's condition, it is not unitless.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
A fast and simple modification of Newton's method hel** to avoid saddle points
Authors:
Tuyen Trung Truong,
Tat Dat To,
Tuan Hang Nguyen,
Thu Hang Nguyen,
Hoang Phuong Nguyen,
Maged Helmy
Abstract:
We propose in this paper New Q-Newton's method. The update rule is very simple conceptually, for example $x_{n+1}=x_n-w_n$ where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n=\nabla ^2f(x_n)+δ_n||\nabla f(x_n)||^2.Id$ and $v_n=A_n^{-1}.\nabla f(x_n)$. Here $δ_n$ is an appropriate real number so that $A_n$ is invertible, and $pr_{A_n,\pm}$ are projections to the vector subspaces generated by eig…
▽ More
We propose in this paper New Q-Newton's method. The update rule is very simple conceptually, for example $x_{n+1}=x_n-w_n$ where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n=\nabla ^2f(x_n)+δ_n||\nabla f(x_n)||^2.Id$ and $v_n=A_n^{-1}.\nabla f(x_n)$. Here $δ_n$ is an appropriate real number so that $A_n$ is invertible, and $pr_{A_n,\pm}$ are projections to the vector subspaces generated by eigenvectors of positive (correspondingly negative) eigenvalues of $A_n$.
The main result of this paper roughly says that if $f$ is $C^3$ (can be unbounded from below) and a sequence $\{x_n\}$, constructed by the New Q-Newton's method from a random initial point $x_0$, {\bf converges}, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method. The first author has recently been successful incorporating Backtracking line search to New Q-Newton's method, thus resolving the convergence guarantee issue observed for some (non-smooth) cost functions. An application to quickly finding zeros of a univariate meromorphic function will be discussed. Various experiments are performed, against well known algorithms such as BFGS and Adaptive Cubic Regularization are presented.
△ Less
Submitted 9 September, 2021; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Coordinate-wise Armijo's condition: General case
Authors:
Tuyen Trung Truong
Abstract:
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequali…
▽ More
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-δ\partial _xf,y-δ\partial _yf)-f(x,y)\leq -αδ(||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*}
In one previous paper, we proposed the following {\bf coordinate-wise} Armijo's condition. Fix again $0<α<1$. A pair of positive numbers $δ_1,δ_2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f(x-δ_1\partial _xf(x,y), y-δ_2\partial _y f(x,y))]-[f(x,y)]\leq -α(δ_1||\partial _xf(x,y)||^2+δ_2||\partial _yf(x,y)||^2). \end{eqnarray*} Previously we applied this condition for functions of the form $f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a general function, it is crucial - for being able to do real computations - to have a systematic algorithm for obtaining $δ_1$ and $δ_2$ satisfying the coordinate-wise version of Armijo's condition, much like Backtracking for the usual Armijo's condition. In this paper we propose such an algorithm, and prove according convergent results.
We then analyse and present experimental results for some functions such as $f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfe's method), $f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrock's function.
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
Some convergent results for Backtracking Gradient Descent method on Banach spaces
Authors:
Tuyen Trung Truong
Abstract:
Our main result concerns the following condition:
{\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then $\nabla f(x)=0$.
We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a…
▽ More
Our main result concerns the following condition:
{\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function $f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then $\nabla f(x)=0$.
We assume that there is given a canonical isomorphism between $X$ and its dual $X^*$, for example when $X$ is a Hilbert space.
{\bf Theorem.} Let $X$ be a reflexive, complete Banach space and $f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C. Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in S}||\nabla ^2f(x)||<\infty$. We choose a random point $x_0\in X$ and construct by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters $α,β,δ_0$, see later for details) the sequence $x_{n+1}=x_n-δ(x_n)\nabla f(x_n)$. Then we have:
1) Every cluster point of $\{x_n\}$, in the {\bf weak} topology, is a critical point of $f$.
2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim _{n\rightarrow\infty}||x_{n+1}-x_n||=0$.
3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$. Let $\mathcal{B}$ be the set of cluster points of $\{x_n\}$. If $\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and $\mathcal{B}$ is connected.
4) Assume that $X$ is separable. Then for generic choices of $α,β,δ_0$ and the initial point $x_0$, if the sequence $\{x_n\}$ converges - in the {\bf weak} topology, then the limit point cannot be a saddle point.
△ Less
Submitted 22 January, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Backtracking Gradient Descent allowing unbounded learning rates
Authors:
Tuyen Trung Truong
Abstract:
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-δ_n \nabla f(x_n)$ it usually is required that the learning rates $δ_n$'s are bounded: $δ_n\leq δ$ for some positive $δ$. Under this assumption, if the sequence $x_n$ converges to a critical point $z$, then with large values of $n$ the update will be small because…
▽ More
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_{n+1}=x_n-δ_n \nabla f(x_n)$ it usually is required that the learning rates $δ_n$'s are bounded: $δ_n\leq δ$ for some positive $δ$. Under this assumption, if the sequence $x_n$ converges to a critical point $z$, then with large values of $n$ the update will be small because $||x_{n+1}-x_n||\lesssim ||\nabla f(x_n)||$. This may also force the sequence to converge to a bad minimum. If we can allow, at least theoretically, that the learning rates $δ_n$'s are not bounded, then we may have better convergence to better minima.
A previous joint paper by the author showed convergence for the usual version of Backtracking GD under very general assumptions on the cost function $f$. In this paper, we allow the learning rates $δ_n$ to be unbounded, in the sense that there is a function $h:(0,\infty)\rightarrow (0,\infty )$ such that $\lim _{t\rightarrow 0}th(t)=0$ and $δ_n\lesssim \max \{h(x_n),δ\}$ satisfies Armijo's condition for all $n$, and prove convergence under the same assumptions as in the mentioned paper. It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence $\{x_n\}$.
A specific way for choosing $δ_n$ in a discrete way connects to Two-way Backtracking GD defined in the mentioned paper. We provide some results which either improve or are implicitly contained in those in the mentioned paper and another recent paper on avoidance of saddle points.
△ Less
Submitted 8 January, 2020; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Coordinate-wise Armijo's condition
Authors:
Tuyen Trung Truong
Abstract:
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequali…
▽ More
Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<α<1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $δ>0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-δ\partial _xf,y-δ\partial _yf)-f(x,y)\leq -αδ(||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*}
When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the following {\bf coordinate-wise} Armijo's condition. Fix again $0<α<1$. A pair of positive numbers $δ_1,δ_2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f_1(x-δ_1\nabla f_1(x))+f_2(y-δ_2\nabla f_2(y))]-[f_1(x)+f_2(y)]\leq -α(δ_1||\nabla f_1(x)||^2+δ_2||\nabla f_2(y)||^2). \end{eqnarray*}
We then extend results in our recent previous results, on Backtracking Gradient Descent and some variants, to this setting. We show by an example the advantage of using coordinate-wise Armijo's condition over the usual Armijo's condition.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Convergence to minima for the continuous version of Backtracking Gradient Descent
Authors:
Tuyen Trung Truong
Abstract:
The main result of this paper is:
{\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $δ_0>0$ and $0<α<1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,δ_0]$ so that the map…
▽ More
The main result of this paper is:
{\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $δ_0>0$ and $0<α<1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,δ_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property:
(i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -αh(x)||\nabla f(x)||^2$.
(ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$.
(iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point.
(iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point.
Some other results are proven.
△ Less
Submitted 13 November, 2019; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Towards Secure and Decentralized Sharing of IoT Data
Authors:
Hien Thi Thu Truong,
Miguel Almeida,
Ghassan Karame,
Claudio Soriente
Abstract:
The Internet of Things (IoT) bears unprecedented security and scalability challenges due to the magnitude of data produced and exchanged by IoT devices and platforms. Some of those challenges are currently being addressed by coupling IoT applications with blockchains. However, current blockchain-backed IoT systems simply use the blockchain to store access control policies, thereby underutilizing t…
▽ More
The Internet of Things (IoT) bears unprecedented security and scalability challenges due to the magnitude of data produced and exchanged by IoT devices and platforms. Some of those challenges are currently being addressed by coupling IoT applications with blockchains. However, current blockchain-backed IoT systems simply use the blockchain to store access control policies, thereby underutilizing the power of blockchain technology. In this paper, we propose a new framework named Sash that couples IoT platforms with blockchain that provides a number of advantages compared to state of the art. In Sash, the blockchain is used to store access control policies and take access control decisions. Therefore, both changes to policies and access requests are correctly enforced and publicly auditable. Further, we devise a ``data marketplace'' by leveraging the ability of blockchains to handle financial transaction and providing ``by design'' remuneration to data producers. Finally, we exploit a special flavor of identity-based encryption to cater for cryptography-enforced access control while minimizing the overhead to distribute decryption keys. We prototype Sash by using the FIWARE open source IoT platform and the Hyperledger Fabric framework as the blockchain back-end. We also evaluate the performance of our prototype and show that it incurs tolerable overhead in realistic deployment settings.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
On the Security of Randomized Defenses Against Adversarial Samples
Authors:
Kumar Sharad,
Giorgia Azzurra Marson,
Hien Thi Thu Truong,
Ghassan Karame
Abstract:
Deep Learning has been shown to be particularly vulnerable to adversarial samples. To combat adversarial strategies, numerous defensive techniques have been proposed. Among these, a promising approach is to use randomness in order to make the classification process unpredictable and presumably harder for the adversary to control. In this paper, we study the effectiveness of randomized defenses aga…
▽ More
Deep Learning has been shown to be particularly vulnerable to adversarial samples. To combat adversarial strategies, numerous defensive techniques have been proposed. Among these, a promising approach is to use randomness in order to make the classification process unpredictable and presumably harder for the adversary to control. In this paper, we study the effectiveness of randomized defenses against adversarial samples. To this end, we categorize existing state-of-the-art adversarial strategies into three attacker models of increasing strength, namely blackbox, graybox, and whitebox (a.k.a.~adaptive) attackers. We also devise a lightweight randomization strategy for image classification based on feature squeezing, that consists of pre-processing the classifier input by embedding randomness within each feature, before applying feature squeezing. We evaluate the proposed defense and compare it to other randomized techniques in the literature via thorough experiments. Our results indeed show that careful integration of randomness can be effective against both graybox and blackbox attacks without significantly degrading the accuracy of the underlying classifier. However, our experimental results offer strong evidence that in the present form such randomization techniques cannot deter a whitebox adversary that has access to all classifier parameters and has full knowledge of the defense. Our work thoroughly and empirically analyzes the impact of randomization techniques against all classes of adversarial strategies.
△ Less
Submitted 16 March, 2020; v1 submitted 11 December, 2018;
originally announced December 2018.
-
Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning
Authors:
Tuyen Trung Truong,
Tuan Hang Nguyen
Abstract:
While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for al…
▽ More
While Standard gradient descent is one very popular optimisation method, its convergence cannot be proven beyond the class of functions whose gradient is globally Lipschitz continuous. As such, it is not actually applicable to realistic applications such as Deep Neural Networks. In this paper, we prove that its backtracking variant behaves very nicely, in particular convergence can be shown for all Morse functions. The main theoretical result of this paper is as follows.
Theorem. Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^1$ function, and $\{z_n\}$ a sequence constructed from the Backtracking gradient descent algorithm. (1) Either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\lim _{n\rightarrow\infty}||z_{n+1}-z_n||=0$. (2) Assume that $f$ has at most countably many critical points. Then either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\{z_n\}$ converges to a critical point of $f$. (3) More generally, assume that all connected components of the set of critical points of $f$ are compact. Then either $\lim _{n\rightarrow\infty}||z_n||=\infty$ or $\{z_n\}$ is bounded. Moreover, in the latter case the set of cluster points of $\{z_n\}$ is connected.
Some generalised versions of this result, including an inexact version, are included. Another result in this paper concerns the problem of saddle points. We then present a heuristic argument to explain why Standard gradient descent method works so well, and modifications of the backtracking versions of GD, MMT and NAG. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify the heuristic argument also for the mini-batch practice and show that our new algorithms, while automatically fine tuning learning rates, perform better than current state-of-the-art methods such as MMT, NAG, Adagrad, Adadelta, RMSProp, Adam and Adamax.
△ Less
Submitted 4 April, 2019; v1 submitted 15 August, 2018;
originally announced August 2018.
-
DoubleEcho: Mitigating Context-Manipulation Attacks in Copresence Verification
Authors:
Hien Thi Thu Truong,
Juhani Toivonen,
Thien Duc Nguyen,
Claudio Soriente,
Sasu Tarkoma,
N. Asokan
Abstract:
Copresence verification based on context can improve usability and strengthen security of many authentication and access control systems. By sensing and comparing their surroundings, two or more devices can tell whether they are copresent and use this information to make access control decisions. To the best of our knowledge, all context-based copresence verification mechanisms to date are suscept…
▽ More
Copresence verification based on context can improve usability and strengthen security of many authentication and access control systems. By sensing and comparing their surroundings, two or more devices can tell whether they are copresent and use this information to make access control decisions. To the best of our knowledge, all context-based copresence verification mechanisms to date are susceptible to context-manipulation attacks. In such attacks, a distributed adversary replicates the same context at the (different) locations of the victim devices, and induces them to believe that they are copresent. In this paper we propose DoubleEcho, a context-based copresence verification technique that leverages acoustic Room Impulse Response (RIR) to mitigate context-manipulation attacks. In DoubleEcho, one device emits a wide-band audible chirp and all participating devices record reflections of the chirp from the surrounding environment. Since RIR is, by its very nature, dependent on the physical surroundings, it constitutes a unique location signature that is hard for an adversary to replicate. We evaluate DoubleEcho by collecting RIR data with various mobile devices and in a range of different locations. We show that DoubleEcho mitigates context-manipulation attacks whereas all other approaches to date are entirely vulnerable to such attacks. DoubleEcho detects copresence (or lack thereof) in roughly 2 seconds and works on commodity devices.
△ Less
Submitted 18 February, 2019; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Sensor-based Proximity Detection in the Face of Active Adversaries
Authors:
Babins Shrestha,
Nitesh Saxena,
Hien Thi Thu Truong,
N. Asokan
Abstract:
Contextual proximity detection (or, co-presence detection) is a promising approach to defend against relay attacks in many mobile authentication systems. We present a systematic assessment of co-presence detection in the presence of a context-manipulating attacker. First, we show that it is feasible to manipulate, consistently control and stabilize the readings of different acoustic and physical e…
▽ More
Contextual proximity detection (or, co-presence detection) is a promising approach to defend against relay attacks in many mobile authentication systems. We present a systematic assessment of co-presence detection in the presence of a context-manipulating attacker. First, we show that it is feasible to manipulate, consistently control and stabilize the readings of different acoustic and physical environment sensors (and even multiple sensors simultaneously) using low-cost, off-the-shelf equipment. Second, based on these capabilities, we show that an attacker who can manipulate the context gains a significant advantage in defeating context-based co-presence detection. For systems that use multiple sensors, we investigate two sensor fusion approaches based on machine learning techniques: features-fusion and decisions-fusion, and show that both are vulnerable to contextual attacks but the latter approach can be more resistant in some cases.
△ Less
Submitted 4 April, 2021; v1 submitted 3 November, 2015;
originally announced November 2015.
-
The Company You Keep: Mobile Malware Infection Rates and Inexpensive Risk Indicators
Authors:
Hien Thi Thu Truong,
Eemil Lagerspetz,
Petteri Nurmi,
Adam J. Oliner,
Sasu Tarkoma,
N. Asokan,
Sourav Bhattacharya
Abstract:
There is little information from independent sources in the public domain about mobile malware infection rates. The only previous independent estimate (0.0009%) [12], was based on indirect measurements obtained from domain name resolution traces. In this paper, we present the first independent study of malware infection rates and associated risk factors using data collected directly from over 55,0…
▽ More
There is little information from independent sources in the public domain about mobile malware infection rates. The only previous independent estimate (0.0009%) [12], was based on indirect measurements obtained from domain name resolution traces. In this paper, we present the first independent study of malware infection rates and associated risk factors using data collected directly from over 55,000 Android devices. We find that the malware infection rates in Android devices estimated using two malware datasets (0.28% and 0.26%), though small, are significantly higher than the previous independent estimate. Using our datasets, we investigate how indicators extracted inexpensively from the devices correlate with malware infection. Based on the hypothesis that some application stores have a greater density of malicious applications and that advertising within applications and cross-promotional deals may act as infection vectors, we investigate whether the set of applications used on a device can serve as an indicator for infection of that device. Our analysis indicates that this alone is not an accurate indicator for pinpointing infection. However, it is a very inexpensive but surprisingly useful way for significantly narrowing down the pool of devices on which expensive monitoring and analysis mechanisms must be deployed. Using our two malware datasets we show that this indicator performs 4.8 and 4.6 times (respectively) better at identifying infected devices than the baseline of random checks. Such indicators can be used, for example, in the search for new or previously undetected malware. It is therefore a technique that can complement standard malware scanning by anti-malware tools. Our analysis also demonstrates a marginally significant difference in battery use between infected and clean devices.
△ Less
Submitted 27 February, 2014; v1 submitted 11 December, 2013;
originally announced December 2013.
-
A Log Auditing Approach for Trust Management in Peer-to-Peer Collaboration
Authors:
Hien Thi Thu Truong,
Claudia-Lavinia Ignat
Abstract:
Nowadays we are faced with an increasing popularity of social software including wikis, blogs, micro-blogs and online social networks such as Facebook and MySpace. Unfortunately, the mostly used social services are centralized and personal information is stored at a single vendor. This results in potential privacy problems as users do not have much control over how their private data is disseminat…
▽ More
Nowadays we are faced with an increasing popularity of social software including wikis, blogs, micro-blogs and online social networks such as Facebook and MySpace. Unfortunately, the mostly used social services are centralized and personal information is stored at a single vendor. This results in potential privacy problems as users do not have much control over how their private data is disseminated. To overcome this limitation, some recent approaches envisioned replacing the single authority centralization of services by a peer-to-peer trust-based approach where users can decide with whom they want to share their private data. In this peer-to-peer collaboration it is very difficult to ensure that after data is shared with other peers, these peers will not misbehave and violate data privacy. In this paper we propose a mechanism that addresses the issue of data privacy violation due to data disclosure to malicious peers. In our approach trust values between users are adjusted according to their previous activities on the shared data. Users share their private data by specifying some obligations the receivers must follow. We log modifications done by users on the shared data as well as the obligations that must be followed when data is shared. By a log-auditing mechanism we detect users that misbehaved and we adjust their associated trust values by using any existing decentralized trust model.
△ Less
Submitted 6 December, 2010;
originally announced December 2010.