-
On the support of solutions to nonlinear stochastic heat equations
Authors:
Beom-Seok Han,
Kunwoo Kim,
Jaeyun Yi
Abstract:
We investigate the strict positivity and the compact support property of solutions to the one-dimensional nonlinear stochastic heat equation: $$\partial_t u(t,x) = \frac{1}{2}\partial^2_x u(t,x) + σ(u(t,x))\dot{W}(t,x), \quad (t,x)\in \mathbf{R}_+\times\mathbf{R},$$ with nonnegative and compactly supported initial data $u_0$, where $\dot{W}$ is the space-time white noise and…
▽ More
We investigate the strict positivity and the compact support property of solutions to the one-dimensional nonlinear stochastic heat equation: $$\partial_t u(t,x) = \frac{1}{2}\partial^2_x u(t,x) + σ(u(t,x))\dot{W}(t,x), \quad (t,x)\in \mathbf{R}_+\times\mathbf{R},$$ with nonnegative and compactly supported initial data $u_0$, where $\dot{W}$ is the space-time white noise and $σ:\mathbf{R} \to \mathbf{R} $ is a continuous function with $σ(0)=0$. We prove that (i) if $v/ σ(v)$ is sufficiently large near $v=0$, then the solution $u(t,\cdot)$ is strictly positive for all $t>0$, and (ii) if $v/σ(v)$ is sufficiently small near $v= 0$, then the solution $u(t,\cdot)$ has compact support for all $t>0$. These findings extend previous results concerning the strict positivity and the compact support property, which were analyzed only for the case $σ(u)\approx u^γ$ for $γ>0$. Additionally, we establish the uniqueness of a solution and the weak comparison principle in case (i).
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Lieb-Schultz-Mattis theorems and generalizations in long-range interacting systems
Authors:
Ruizhi Liu,
**min Yi,
Shiyu Zhou,
Liujun Zou
Abstract:
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin inter…
▽ More
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin interactions, these theorems hold when the interactions decay faster than $1/r^2$, with $r$ the distance between the two interacting spins. Moreover, any pure state with an anomalous symmetry, which may not be a ground state of any natural Hamiltonian, must be long-range entangled. The symmetries we consider include on-site internal symmetries combined with lattice translation symmetries, and they can also extend to purely internal but non-on-site symmetries. Moreover, these internal symmetries can be discrete or continuous. We explore the applications of the theorems through various examples.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements
Authors:
Mason Ma,
Jiajie Wu,
Chase Post,
Tony Shi,
**gang Yi,
Tony Schmitz,
Hong Wang
Abstract:
This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-info…
▽ More
This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-informed machine learning capabilities for modeling nonlinear dynamics with control and integrates them into a model predictive control framework. To demonstrate the capability of the proposed method we test and validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system, and turning machine tool. Analysis of the results illustrate that the proposed method outperforms state-of-the-art benchmarks as measured by both modeling accuracy and control performance for nonlinear dynamic systems under high-noise conditions.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
$L_p$-regularity theory for the stochastic reaction-diffusion equation with super-linear multiplicative noise and strong dissipativity
Authors:
Beom-Seok Han,
Jaeyun Yi
Abstract:
We study the existence, uniqueness, and regularity of the solution to the stochastic reaction-diffusion equation (SRDE) with colored noise $\dot{F}$: $$ \partial_t u = a^{ij}u_{x^ix^j} + b^i u_{x^i} + cu - \bar{b} u^{1+β} + ξu^{1+γ}\dot F,\quad (t,x)\in \mathbb{R}_+\times\mathbb{R}^d; \quad u(0,\cdot) = u_0, $$ where $a^{ij},b^i,c, \bar{b}$ and $ξ$ are $C^2$ or $L_\infty$ bounded random coefficien…
▽ More
We study the existence, uniqueness, and regularity of the solution to the stochastic reaction-diffusion equation (SRDE) with colored noise $\dot{F}$: $$ \partial_t u = a^{ij}u_{x^ix^j} + b^i u_{x^i} + cu - \bar{b} u^{1+β} + ξu^{1+γ}\dot F,\quad (t,x)\in \mathbb{R}_+\times\mathbb{R}^d; \quad u(0,\cdot) = u_0, $$ where $a^{ij},b^i,c, \bar{b}$ and $ξ$ are $C^2$ or $L_\infty$ bounded random coefficients. Here $β>0$ denotes the degree of the strong dissipativity and $γ>0$ represents the degree of stochastic force. Under the reinforced Dalang's condition on $\dot{F}$, we show the well-posedness of the SRDE provided $γ< \frac{κ(β+1)}{d+2}$ where $κ>0$ is the constant related to $\dot F$. Our result assures that strong dissipativity prevents the solution from blowing up. Moreover, we provide the maximal Hölder regularity of the solution in time and space.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Fractal geometry of the PAM in 2D and 3D with white noise potential
Authors:
Promit Ghosal,
Jaeyun Yi
Abstract:
We study the parabolic Anderson model (PAM) \begin{equation}
{\partial \over \partial t}u(t,x) =\frac{1}{2}Δu(t,x) + u(t,x)ξ(x), \quad t>0, x\in \mathbb{R}^d, \quad \text{and} \quad
u(0,x) \equiv 1, \quad \forall x\in \mathbb{R}^d,
\end{equation} where $ξ$ is spatial white noise on $\mathbb{R}^d$ with $d \in\{2,3\}$. We show that the peaks of the PAM are macroscopically multifractal. More pr…
▽ More
We study the parabolic Anderson model (PAM) \begin{equation}
{\partial \over \partial t}u(t,x) =\frac{1}{2}Δu(t,x) + u(t,x)ξ(x), \quad t>0, x\in \mathbb{R}^d, \quad \text{and} \quad
u(0,x) \equiv 1, \quad \forall x\in \mathbb{R}^d,
\end{equation} where $ξ$ is spatial white noise on $\mathbb{R}^d$ with $d \in\{2,3\}$. We show that the peaks of the PAM are macroscopically multifractal. More precisely, we prove that the spatial peaks of the PAM have infinitely many distinct values and we compute the macroscopic Hausdorff dimension (introduced by Barlow and Taylor) of those peaks. As a byproduct, we obtain the exact spatial asymptotics of the solution of the PAM. We also study the spatio-temporal peaks of the PAM and show their macroscopic multifractality. Some of the major tools used in our proof techniques include paracontrolled calculus and tail probabilities of the largest point in the spectrum of the Anderson Hamiltonian.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Accelerating the Convergence Rate of Consensus for Second-Order Multi-Agent Systems by Memory Information
Authors:
Jiahao Dai,
**g-Wen Yi,
Li Chai
Abstract:
This paper utilizes the agent's memory in accelerated consensus for second-order multi-agent systems (MASs). In the case of one-tap memory, explicit formulas for the optimal consensus convergence rate and control parameters are derived by applying the Jury stability criterion. It is proved that the optimal consensus convergence rate with one-tap memory is faster than that without memory. In the ca…
▽ More
This paper utilizes the agent's memory in accelerated consensus for second-order multi-agent systems (MASs). In the case of one-tap memory, explicit formulas for the optimal consensus convergence rate and control parameters are derived by applying the Jury stability criterion. It is proved that the optimal consensus convergence rate with one-tap memory is faster than that without memory. In the case of M-tap memory, an iterative algorithm is given to derive the control parameters to accelerate the convergence rate. Moreover, the accelerated consensus with one-tap memory is extended to the formation control, and the control parameters to achieve the fastest formation are obtained. Numerical examples further illustrate the theoretical results.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
Authors:
Jialin Yi,
Milan Vojnović
Abstract:
We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the…
▽ More
We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms.
△ Less
Submitted 21 October, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Cooperative data-driven modeling
Authors:
Aleksandr Dekhovich,
O. Taylan Turan,
Jiaxiang Yi,
Miguel A. Bessa
Abstract:
Data-driven modeling in mechanics is evolving rapidly based on recent machine learning advances, especially on artificial neural networks. As the field matures, new data and models created by different groups become available, opening possibilities for cooperative modeling. However, artificial neural networks suffer from catastrophic forgetting, i.e. they forget how to perform an old task when tra…
▽ More
Data-driven modeling in mechanics is evolving rapidly based on recent machine learning advances, especially on artificial neural networks. As the field matures, new data and models created by different groups become available, opening possibilities for cooperative modeling. However, artificial neural networks suffer from catastrophic forgetting, i.e. they forget how to perform an old task when trained on a new one. This hinders cooperation because adapting an existing model for a new task affects the performance on a previous task trained by someone else. The authors developed a continual learning method that addresses this issue, applying it here for the first time to solid mechanics. In particular, the method is applied to recurrent neural networks to predict history-dependent plasticity behavior, although it can be used on any other architecture (feedforward, convolutional, etc.) and to predict other phenomena. This work intends to spawn future developments on continual learning that will foster cooperative strategies among the mechanics community to solve increasingly challenging problems. We show that the chosen continual learning strategy can sequentially learn several constitutive laws without forgetting them, using less data to achieve the same error as standard (non-cooperative) training of one law per model.
△ Less
Submitted 8 March, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.
-
Mutual Information Learned Regressor: an Information-theoretic Viewpoint of Training Regression Systems
Authors:
Jirong Yi,
Qiaosheng Zhang,
Zhen Chen,
Qiao Liu,
Wei Shao,
Yusen He,
Yaohua Wang
Abstract:
As one of the central tasks in machine learning, regression finds lots of applications in different fields. An existing common practice for solving regression problems is the mean square error (MSE) minimization approach or its regularized variants which require prior knowledge about the models. Recently, Yi et al., proposed a mutual information based supervised learning framework where they intro…
▽ More
As one of the central tasks in machine learning, regression finds lots of applications in different fields. An existing common practice for solving regression problems is the mean square error (MSE) minimization approach or its regularized variants which require prior knowledge about the models. Recently, Yi et al., proposed a mutual information based supervised learning framework where they introduced a label entropy regularization which does not require any prior knowledge. When applied to classification tasks and solved via a stochastic gradient descent (SGD) optimization algorithm, their approach achieved significant improvement over the commonly used cross entropy loss and its variants. However, they did not provide a theoretical convergence analysis of the SGD algorithm for the proposed formulation. Besides, applying the framework to regression tasks is nontrivial due to the potentially infinite support set of the label. In this paper, we investigate the regression under the mutual information based supervised learning framework. We first argue that the MSE minimization approach is equivalent to a conditional entropy learning problem, and then propose a mutual information learning formulation for solving regression problems by using a reparameterization technique. For the proposed formulation, we give the convergence analysis of the SGD algorithm for solving it in practice. Finally, we consider a multi-output regression data model where we derive the generalization performance lower bound in terms of the mutual information associated with the underlying data distribution. The result shows that the high dimensionality can be a bless instead of a curse, which is controlled by a threshold. We hope our work will serve as a good starting point for further research on the mutual information based regression.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
P-adic incomplete gamma functions and Artin-Hasse-type series
Authors:
Xiaojian Li,
Jay Reiter,
Shiang Tang,
Napoleon Wang,
** Yi
Abstract:
We define and study a $p$-adic analogue of the incomplete gamma function related to Morita's $p$-adic gamma function. We also discuss a combinatorial identity related to the Artin-Hasse series, which is a special case of the exponential principle in combinatorics. From this we deduce a curious $p$-adic property of $|\mathrm{Hom} (G,S_n)|$ for a topologically finitely generated group $G$, using a c…
▽ More
We define and study a $p$-adic analogue of the incomplete gamma function related to Morita's $p$-adic gamma function. We also discuss a combinatorial identity related to the Artin-Hasse series, which is a special case of the exponential principle in combinatorics. From this we deduce a curious $p$-adic property of $|\mathrm{Hom} (G,S_n)|$ for a topologically finitely generated group $G$, using a characterization of $p$-adic continuity for certain functions $f \colon \mathbb Z_{>0} \to \mathbb Q_p$ due to O'Desky-Richman. In the end, we give an exposition of some standard properties of the Artin-Hasse series.
△ Less
Submitted 28 November, 2022; v1 submitted 24 July, 2022;
originally announced July 2022.
-
Fast consensus of high-order multi-agent systems
Authors:
Jiahao Dai,
**g-Wen Yi,
Li Chai
Abstract:
In this paper, the fast consensus problem of high-order multi-agent systems under undirected topologies is considered. The direct link between the consensus convergence rate and the control gains is established. An accelerated consensus algorithm based on gradient descent is proposed to optimize the convergence rate. By applying the Routh-Hurwitz stability criterion, the lower bound on the converg…
▽ More
In this paper, the fast consensus problem of high-order multi-agent systems under undirected topologies is considered. The direct link between the consensus convergence rate and the control gains is established. An accelerated consensus algorithm based on gradient descent is proposed to optimize the convergence rate. By applying the Routh-Hurwitz stability criterion, the lower bound on the convergence rate is derived, and explicit control gains are derived as the necessary condition to achieve the optimal convergence rate. Moreover, a protocol with time-varying control gains is designed to achieve the finite-time consensus. Explicit formulas for the time-varying control gains and the final consensus state are given. Numerical examples and simulation results are presented to illustrate the obtained theoretical results.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor
Authors:
Lijun Zhang,
Wei Jiang,
**feng Yi,
Tianbao Yang
Abstract:
In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost. In the setting of learning with expert advice, Daniely and Mansour (2019) demonstrate that Discounted-Normal-Predictor can be util…
▽ More
In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost. In the setting of learning with expert advice, Daniely and Mansour (2019) demonstrate that Discounted-Normal-Predictor can be utilized to yield nearly optimal regret bounds over any interval, even in the presence of switching costs. Inspired by their results, we develop a simple algorithm for SOCO: Combining online gradient descent (OGD) with different step sizes sequentially by Discounted-Normal-Predictor. Despite its simplicity, we prove that it is able to minimize the adaptive regret with switching cost, i.e., attaining nearly optimal regret with switching cost on every interval. By exploiting the theoretical guarantee of OGD for dynamic regret, we further show that the proposed algorithm can minimize the dynamic regret with switching cost in every interval.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
The compact support property for solutions to the stochastic partial differential equations with colored noise
Authors:
Beom-Seok Han,
Kunwoo Kim,
Jaeyun Yi
Abstract:
We study the compact support property for solutions of the following stochastic partial differential equations: $$\partial_t u = a^{ij}u_{x^ix^j}(t,x)+b^{i}u_{x^i}(t,x)+cu+h(t,x,u(t,x))\dot{F}(t,x),\quad (t,x)\in (0,\infty)\times{\bf{R}}^d,$$ where $\dot{F}$ is a spatially homogeneous Gaussian noise that is white in time and colored in space, and $h(t, x, u)$ satisfies…
▽ More
We study the compact support property for solutions of the following stochastic partial differential equations: $$\partial_t u = a^{ij}u_{x^ix^j}(t,x)+b^{i}u_{x^i}(t,x)+cu+h(t,x,u(t,x))\dot{F}(t,x),\quad (t,x)\in (0,\infty)\times{\bf{R}}^d,$$ where $\dot{F}$ is a spatially homogeneous Gaussian noise that is white in time and colored in space, and $h(t, x, u)$ satisfies $K^{-1}|u|^λ\leq h(t, x, u)\leq K(1+|u|)$ for $λ\in(0,1)$ and $K\geq 1$. We show that if the initial data $u_0\geq 0$ has a compact support, then, under the reinforced Dalang's condition on $\dot{F}$ (which guarantees the existence and the Hölder continuity of a weak solution), all nonnegative weak solutions $u(t, \cdot)$ have the compact support for all $t>0$ with probability 1. Our results extend the works by Mueller-Perkins [Probab. Theory Relat. Fields, 93(3):325--358, 1992] and Krylov [Probab. Theory Relat. Fields, 108(4):543--557, 1997], in which they show the compact support property only for the one-dimensional SPDEs driven by space-time white noise on $(0, \infty)\times \bf{R}$.
△ Less
Submitted 6 March, 2023; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Optimal Memory Scheme for Accelerated Consensus Over Multi-Agent Networks
Authors:
Jiahao Dai,
**g-Wen Yi,
Li Chai
Abstract:
The consensus over multi-agent networks can be accelerated by introducing agent's memory to the control protocol. In this paper, a more general protocol with the node memory and the state deviation memory is designed. We aim to provide the optimal memory scheme to accelerate consensus. The contributions of this paper are three: (i) For the one-tap memory scheme, we demonstrate that the state devia…
▽ More
The consensus over multi-agent networks can be accelerated by introducing agent's memory to the control protocol. In this paper, a more general protocol with the node memory and the state deviation memory is designed. We aim to provide the optimal memory scheme to accelerate consensus. The contributions of this paper are three: (i) For the one-tap memory scheme, we demonstrate that the state deviation memory is useless for the optimal convergence. (ii) In the worst case, we prove that it is a vain to add any tap of the state deviation memory, and the one-tap node memory is sufficient to achieve the optimal convergence. (iii) We show that the two-tap state deviation memory is effective on some special networks, such as star networks. Numerical examples are listed to illustrate the validity and correctness of the obtained results.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Convergence Rate of Accelerated Average Consensus with Local Node Memory: Optimization and Analytic Solutions
Authors:
**g-Wen Yi,
Li Chai,
**gxin Zhang
Abstract:
Previous researches have shown that adding local memory can accelerate the consensus. It is natural to ask questions like what is the fastest rate achievable by the $M$-tap memory acceleration, and what are the corresponding control parameters. This paper introduces a set of effective and previously unused techniques to analyze the convergence rate of accelerated consensus with $M$-tap memory of l…
▽ More
Previous researches have shown that adding local memory can accelerate the consensus. It is natural to ask questions like what is the fastest rate achievable by the $M$-tap memory acceleration, and what are the corresponding control parameters. This paper introduces a set of effective and previously unused techniques to analyze the convergence rate of accelerated consensus with $M$-tap memory of local nodes and to design the control protocols. These effective techniques, including the Kharitonov stability theorem, the Routh stability criterion and the robust stability margin, have led to the following new results: 1) the direct link between the convergence rate and the control parameters; 2) explicit formulas of the optimal convergence rate and the corresponding optimal control parameters for $M \leq 2$ on a given graph; 3) the optimal worst-case convergence rate and the corresponding optimal control parameters for the memory $M \geq 1$ on a set of uncertain graphs. We show that the acceleration with the memory $M = 1$ provides the optimal convergence rate in the sense of worst-case performance. Several numerical examples are given to demonstrate the validity and performance of the theoretical results.
△ Less
Submitted 10 December, 2021; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Fractal Geometry of the Valleys of the Parabolic Anderson Equation
Authors:
Promit Ghosal,
Jaeyun Yi
Abstract:
We study the macroscopic fractal properties of the deep valleys of the solution of the $(1+1)$-dimensional parabolic Anderson equation $${\partial \over \partial t}u(t,x) =\frac{1}{2} {\partial^2 \over \partial x^2} u(t,x) + u(t,x)\dot{W}(t,x),t>0, x\in {\bf R},\quad
u(0,x) \equiv u_0(x),x\in {\bf R}, $$ where $\dot{W}$ is the time-space white noise and…
▽ More
We study the macroscopic fractal properties of the deep valleys of the solution of the $(1+1)$-dimensional parabolic Anderson equation $${\partial \over \partial t}u(t,x) =\frac{1}{2} {\partial^2 \over \partial x^2} u(t,x) + u(t,x)\dot{W}(t,x),t>0, x\in {\bf R},\quad
u(0,x) \equiv u_0(x),x\in {\bf R}, $$ where $\dot{W}$ is the time-space white noise and $0<\inf_{x\in {\bf R}} u_0(x)\leq \sup_{x\in {\bf R}} u_0(x)<\infty.$ Unlike the macroscopic multifractality of the tall peaks, we show that valleys of the parabolic Anderson equation are macroscopically monofractal. In fact, the macroscopic Hausdorff dimension (introduced by Barlow and Taylor [J. Phys. A 22 (1989) 2621--2628; Proc. Lond. Math. Soc. (3) 64 (1992) 125--152]) of the valleys undergoes a phase transition at a point which does not depend on the initial data. The key tool of our proof is a lower bound to the lower tail probability of the parabolic Anderson equation. Such lower bound is obtained for the first time in this paper and will be derived by utilizing the connection between the parabolic Anderson equation and the Kardar-Parisi-Zhang equation. Our techniques of proving this lower bound can be extended to other models in the KPZ universality class including the KPZ fixed point.
△ Less
Submitted 10 August, 2021; v1 submitted 9 August, 2021;
originally announced August 2021.
-
A Simple yet Universal Strategy for Online Convex Optimization
Authors:
Lijun Zhang,
Guanghui Wang,
**feng Yi,
Tianbao Yang
Abstract:
Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, which makes it difficult to exploit the structure of the problem and utilize the vast amount of existing algorithms. In this paper, we propose a…
▽ More
Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, which makes it difficult to exploit the structure of the problem and utilize the vast amount of existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the \emph{linearized} losses to aggregate predictions from experts. Specifically, we choose Adapt-ML-Prod to track the best expert, because it has a second-order bound and can be used to leverage strong convexity and exponential concavity. In this way, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. Furthermore, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound.
△ Less
Submitted 14 May, 2021; v1 submitted 8 May, 2021;
originally announced May 2021.
-
Solving Large Scale Quadratic Constrained Basis Pursuit
Authors:
Jirong Yi
Abstract:
Inspired by alternating direction method of multipliers and the idea of operator splitting, we propose a efficient algorithm for solving large-scale quadratically constrained basis pursuit. Experimental results show that the proposed algorithm can achieve 50~~100 times speedup when compared with the baseline interior point algorithm implemented in CVX.
Inspired by alternating direction method of multipliers and the idea of operator splitting, we propose a efficient algorithm for solving large-scale quadratically constrained basis pursuit. Experimental results show that the proposed algorithm can achieve 50~~100 times speedup when compared with the baseline interior point algorithm implemented in CVX.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Macroscopic multi-fractality of Gaussian random fields and linear SPDEs with colored noise
Authors:
Jaeyun Yi
Abstract:
We consider the linear stochastic heat and wave equations with generalized Gaussian noise that is white in time and spatially correlated. Under the assumption that the homogeneous spatial correlation $f$ satisfies some mild conditions, we show that the solutions to the linear stochastic heat and wave equations exhibit tall peaks in macroscopic scales, which means they are macroscopically multi-fra…
▽ More
We consider the linear stochastic heat and wave equations with generalized Gaussian noise that is white in time and spatially correlated. Under the assumption that the homogeneous spatial correlation $f$ satisfies some mild conditions, we show that the solutions to the linear stochastic heat and wave equations exhibit tall peaks in macroscopic scales, which means they are macroscopically multi-fractal. We compute the macroscopic Hausdorff dimension of the peaks for Gaussian random fields with vanishing correlation and then apply this result to the solution of the linear stochastic heat and wave equations. We also study the spatio-temporal multi-fractality of the linear stochastic heat and wave equations. Our result is an extension of Khoshnevisan, Kim, and Xiao \cite{KKX,KKX2} and Kim \cite{K} to a more general class of the linear stochastic partial differential equations and Gaussian random fields.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
Gaussian Process (GP)-based Learning Control of Selective Laser Melting Process
Authors:
Farshid Asadi,
Alaa A. Olleak,
**gang Yi,
Yuebin Guo
Abstract:
Selective laser melting (SLM) is one of emerging processes for effective metal additive manufacturing. Due to complex heat exchange and material phase changes, it is challenging to accurately model the SLM dynamics and design robust control of SLM process. In this paper, we first present a data-driven Gaussian process based dynamic model for SLM process and then design a model predictive control t…
▽ More
Selective laser melting (SLM) is one of emerging processes for effective metal additive manufacturing. Due to complex heat exchange and material phase changes, it is challenging to accurately model the SLM dynamics and design robust control of SLM process. In this paper, we first present a data-driven Gaussian process based dynamic model for SLM process and then design a model predictive control to regulate the melt pool size. Physical and process constraints are considered in the controller design. The learning model and control design are tested and validated with high-fidelity finite element simulation. The comparison results with other control design demonstrate the efficacy of the control design.
△ Less
Submitted 24 March, 2021; v1 submitted 9 October, 2020;
originally announced October 2020.
-
Limit theorems for time-dependent averages of nonlinear stochastic heat equations
Authors:
Kunwoo Kim,
Jaeyun Yi
Abstract:
We study limit theorems for time-dependent averages of the form $X_t:=\frac{1}{2L(t)}\int_{-L(t)}^{L(t)} u(t, x) \, dx$, as $t\to \infty$, where $L(t)=\exp(λt)$ and $u(t, x)$ is the solution to a stochastic heat equation on $\mathbb{R}_+\times \mathbb{R}$ driven by space-time white noise with $u_0(x)=1$ for all $x\in \mathbb{R}$. We show that for $X_t$
(i) the weak law of large numbers holds whe…
▽ More
We study limit theorems for time-dependent averages of the form $X_t:=\frac{1}{2L(t)}\int_{-L(t)}^{L(t)} u(t, x) \, dx$, as $t\to \infty$, where $L(t)=\exp(λt)$ and $u(t, x)$ is the solution to a stochastic heat equation on $\mathbb{R}_+\times \mathbb{R}$ driven by space-time white noise with $u_0(x)=1$ for all $x\in \mathbb{R}$. We show that for $X_t$
(i) the weak law of large numbers holds when $λ>λ_1$,
(ii) the strong law of large numbers holds when $λ>λ_2$,
(iii) the central limit theorem holds when $λ>λ_3$, but fails when $λ<λ_4\leq λ_3$,
(iv) the quantitative central limit theorem holds when $λ>λ_5$,
where $λ_i$'s are positive constants depending on the moment Lyapunov exponents of $u(t, x)$.
△ Less
Submitted 10 December, 2020; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Do Deep Minds Think Alike? Selective Adversarial Attacks for Fine-Grained Manipulation of Multiple Deep Neural Networks
Authors:
Zain Khan,
Jirong Yi,
Raghu Mudumbai,
Xiaodong Wu,
Weiyu Xu
Abstract:
Recent works have demonstrated the existence of {\it adversarial examples} targeting a single machine learning system. In this paper we ask a simple but fundamental question of "selective fooling": given {\it multiple} machine learning systems assigned to solve the same classification problem and taking the same input signal, is it possible to construct a perturbation to the input signal that mani…
▽ More
Recent works have demonstrated the existence of {\it adversarial examples} targeting a single machine learning system. In this paper we ask a simple but fundamental question of "selective fooling": given {\it multiple} machine learning systems assigned to solve the same classification problem and taking the same input signal, is it possible to construct a perturbation to the input signal that manipulates the outputs of these {\it multiple} machine learning systems {\it simultaneously} in arbitrary pre-defined ways? For example, is it possible to selectively fool a set of "enemy" machine learning systems but does not fool the other "friend" machine learning systems? The answer to this question depends on the extent to which these different machine learning systems "think alike". We formulate the problem of "selective fooling" as a novel optimization problem, and report on a series of experiments on the MNIST dataset. Our preliminary findings from these experiments show that it is in fact very easy to selectively manipulate multiple MNIST classifiers simultaneously, even when the classifiers are identical in their architectures, training algorithms and training datasets except for random initialization during training. This suggests that two nominally equivalent machine learning systems do not in fact "think alike" at all, and opens the possibility for many novel applications and deeper understandings of the working principles of deep neural networks.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
On coloring numbers of graph powers
Authors:
H. A. Kierstead,
Daqing Yang,
Junjun Yi
Abstract:
The weak $r$-coloring numbers $wcol_r(G)$ of a graph $G$ were introduced by the first two authors as a generalization of the usual coloring number $col(G)$, and have since found interesting theoretical and algorithmic applications. This has motivated researchers to establish strong bounds on these parameters for various classes of graphs.
Let $G^p$ denote the $p$-th power of $G$. We show that, a…
▽ More
The weak $r$-coloring numbers $wcol_r(G)$ of a graph $G$ were introduced by the first two authors as a generalization of the usual coloring number $col(G)$, and have since found interesting theoretical and algorithmic applications. This has motivated researchers to establish strong bounds on these parameters for various classes of graphs.
Let $G^p$ denote the $p$-th power of $G$. We show that, all integers $p >0$ and $Δ\ge 3$ and graphs $G$ with $Δ(G) \leq Δ$ satisfy $col(G^p) \in O(p \cdot wcol_{\lceil p/2\rceil}(G)(Δ-1)^{\lfloor p/2\rfloor})$; for fixed tree width or fixed genus the ratio between this upper bound and worst case lower bounds is polynomial in $p$. For the square of graphs $G$, we also show that, if the maximum average degree $2k-2 < mad(G) \leq 2k$, then $ col(G^2) \leq (2k-1)Δ(G)+2k+1$.
△ Less
Submitted 20 October, 2019; v1 submitted 25 July, 2019;
originally announced July 2019.
-
Outlier Detection using Generative Models with Theoretical Performance Guarantees
Authors:
Jirong Yi,
Anh Duc Le,
Tianming Wang,
Xiaodong Wu,
Weiyu Xu
Abstract:
This paper considers the problem of recovering signals from compressed measurements contaminated with sparse outliers, which has arisen in many applications. In this paper, we propose a generative model neural network approach for reconstructing the ground truth signals under sparse outliers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outli…
▽ More
This paper considers the problem of recovering signals from compressed measurements contaminated with sparse outliers, which has arisen in many applications. In this paper, we propose a generative model neural network approach for reconstructing the ground truth signals under sparse outliers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We establish the recovery guarantees for reconstruction of signals using generative models in the presence of outliers, and give an upper bound on the number of outliers allowed for recovery. Our results are applicable to both the linear generator neural network and the nonlinear generator neural network with an arbitrary number of layers. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Authors:
Zaiyi Chen,
Zhuoning Yuan,
**feng Yi,
Bowen Zhou,
Enhong Chen,
Tianbao Yang
Abstract:
Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewi…
▽ More
Although stochastic gradient descent (SGD) method and its variants (e.g., stochastic momentum methods, AdaGrad) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for SGD and its variants that use stagewise step size and return an averaged solution in practice. In addition, theoretical insights of why adaptive step size of AdaGrad could improve non-adaptive step size of {\sgd} is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of {\bf non-smooth non-convex} (namely weakly convex) problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities {\it increasing} as the stage number. Our theoretical results of stagewise AdaGrad exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise SGD. To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier. Besides theoretical contributions, our empirical studies show that our stagewise SGD and ADAGRAD improve the generalization performance of existing variants/implementations of SGD and ADAGRAD.
△ Less
Submitted 5 March, 2019; v1 submitted 19 August, 2018;
originally announced August 2018.
-
Necessary and Sufficient Null Space Condition for Nuclear Norm Minimization in Low-Rank Matrix Recovery
Authors:
Jirong Yi,
Weiyu Xu
Abstract:
Low-rank matrix recovery has found many applications in science and engineering such as machine learning, signal processing, collaborative filtering, system identification, and Euclidean embedding. But the low-rank matrix recovery problem is an NP hard problem and thus challenging. A commonly used heuristic approach is the nuclear norm minimization. In [12,14,15], the authors established the neces…
▽ More
Low-rank matrix recovery has found many applications in science and engineering such as machine learning, signal processing, collaborative filtering, system identification, and Euclidean embedding. But the low-rank matrix recovery problem is an NP hard problem and thus challenging. A commonly used heuristic approach is the nuclear norm minimization. In [12,14,15], the authors established the necessary and sufficient null space conditions for nuclear norm minimization to recover every possible low-rank matrix with rank at most r (the strong null space condition). In addition, in [12], Oymak et al. established a null space condition for successful recovery of a given low-rank matrix (the weak null space condition) using nuclear norm minimization, and derived the phase transition for the nuclear norm minimization. In this paper, we show that the weak null space condition in [12] is only a sufficient condition for successful matrix recovery using nuclear norm minimization, and is not a necessary condition as claimed in [12]. In this paper, we further give a weak null space condition for low-rank matrix recovery, which is both necessary and sufficient for the success of nuclear norm minimization. At the core of our derivation are an inequality for characterizing the nuclear norms of block matrices, and the conditions for equality to hold in that inequality.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Separation-Free Super-Resolution from Compressed Measurements is Possible: an Orthonormal Atomic Norm Minimization Approach
Authors:
Weiyu Xu,
Jirong Yi,
Soura Dasgupta,
Jian-Feng Cai,
Mathews Jacob,
Myung Cho
Abstract:
We consider the problem of recovering the superposition of $R$ distinct complex exponential functions from compressed non-uniform time-domain samples. Total Variation (TV) minimization or atomic norm minimization was proposed in the literature to recover the $R$ frequencies or the missing data. However, it is known that in order for TV minimization and atomic norm minimization to recover the missi…
▽ More
We consider the problem of recovering the superposition of $R$ distinct complex exponential functions from compressed non-uniform time-domain samples. Total Variation (TV) minimization or atomic norm minimization was proposed in the literature to recover the $R$ frequencies or the missing data. However, it is known that in order for TV minimization and atomic norm minimization to recover the missing data or the frequencies, the underlying $R$ frequencies are required to be well-separated, even when the measurements are noiseless. This paper shows that the Hankel matrix recovery approach can super-resolve the $R$ complex exponentials and their frequencies from compressed non-uniform measurements, regardless of how close their frequencies are to each other. We propose a new concept of orthonormal atomic norm minimization (OANM), and demonstrate that the success of Hankel matrix recovery in separation-free super-resolution comes from the fact that the nuclear norm of a Hankel matrix is an orthonormal atomic norm. More specifically, we show that, in traditional atomic norm minimization, the underlying parameter values $\textbf{must}$ be well separated to achieve successful signal recovery, if the atoms are changing continuously with respect to the continuously-valued parameter. In contrast, for the OANM, it is possible the OANM is successful even though the original atoms can be arbitrarily close.
As a byproduct of this research, we provide one matrix-theoretic inequality of nuclear norm, and give its proof from the theory of compressed sensing.
△ Less
Submitted 4 November, 2017;
originally announced November 2017.
-
Fast dose optimization for rotating shield brachytherapy
Authors:
Myung Cho,
Xiaodong Wu,
Hossein Dakhah,
Jirong Yi,
Ryan T. Flynn,
Yusung Kim,
Weiyu Xu
Abstract:
Purpose: To provide a fast computational method, based on the proximal graph solver (POGS) - a convex optimization solver using the alternating direction method of multipliers (ADMM), for calculating an optimal treatment plan in rotating shield brachytherapy (RSBT). RSBT treatment planning has more degrees of freedom than conventional high-dose-rate brachytherapy (HDR-BT) due to the addition of em…
▽ More
Purpose: To provide a fast computational method, based on the proximal graph solver (POGS) - a convex optimization solver using the alternating direction method of multipliers (ADMM), for calculating an optimal treatment plan in rotating shield brachytherapy (RSBT). RSBT treatment planning has more degrees of freedom than conventional high-dose-rate brachytherapy (HDR-BT) due to the addition of emission direction, and this necessitates a fast optimization technique to enable clinical usage. // Methods: The multi-helix RSBT (H-RSBT) delivery technique was considered with five representative cervical cancer patients. Treatment plans were generated for all patients using the POGS method and the previously considered commercial solver IBM CPLEX. The rectum, bladder, sigmoid, high-risk clinical target volume (HR-CTV), and HR-CTV boundary were the structures considered in our optimization problem, called the asymmetric dose-volume optimization with smoothness control. Dose calculation resolution was 1x1x3 mm^3 for all cases. The H-RSBT applicator has 6 helices, with 33.3 mm of translation along the applicator per helical rotation and 1.7 mm spacing between dwell positions, yielding 17.5 degree emission angle spacing per 5 mm along the applicator.// Results: For each patient, HR-CTV D90, HR-CTV D100, rectum D2cc, sigmoid D2cc, and bladder D2cc matched within 1% for CPLEX and POGS. Also, we obtained similar EQD2 figures between CPLEX and POGS. POGS was around 18 times faster than CPLEX. Over all patients, total optimization times were 32.1-65.4 seconds for CPLEX and 2.1-3.9 seconds for POGS. // Conclusions: POGS substantially reduced treatment plan optimization time around 18 times for RSBT with similar HR-CTV D90, OAR D2cc values, and EQD2 figure relative to CPLEX, which is significant progress toward clinical translation of RSBT. POGS is also applicable to conventional HDR-BT.
△ Less
Submitted 19 April, 2017;
originally announced April 2017.
-
Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient
Authors:
Tianbao Yang,
Lijun Zhang,
Rong **,
**feng Yi
Abstract:
This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e., the minimizers change slowly), we present several improved variation-based upper bounds of the d…
▽ More
This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e., the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are {\it optimal} in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant's minimizers, to which we refer as {\it path variation}. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches what is achieved with full information.
△ Less
Submitted 15 May, 2016;
originally announced May 2016.