-
Solving Moving Sofa Problem Using Calculus of Variations
Authors:
Zhipeng Deng
Abstract:
In 1966, Leo Moser introduced the "moving sofa problem," which seeks to determine the largest area of a shape that can be maneuvered through a 90-degree hallway of unit-width. This problem remains unsolved and open yet. In this paper, we employ calculus of variations method to solve this problem. Assuming the trajectories and envelopes are convex, the sofa's area is formulated as an integral funct…
▽ More
In 1966, Leo Moser introduced the "moving sofa problem," which seeks to determine the largest area of a shape that can be maneuvered through a 90-degree hallway of unit-width. This problem remains unsolved and open yet. In this paper, we employ calculus of variations method to solve this problem. Assuming the trajectories and envelopes are convex, the sofa's area is formulated as an integral functional on a set of parametric equations for curves. The final shape is determined by solving the Euler-Lagrange equations. Utilizing numerical methods, we obtain the non-trivial area of 2.2195316, consistent with the previously well-known Gerver's constant since 1992. We prove that both the results of Gerver's sofa and Romik's car satisfy the Euler-Lagrange equations for the necessary condition of maximal area. We also explore additional cases and asymmetric conditions, and discuss other variant problems.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Seymour and Woodall's conjecture holds for graphs with independence number two
Authors:
Rong Chen,
Zijian Deng
Abstract:
Woodall (and Seymour independently) in 2001 proposed a conjecture that every graph $G$ contains every complete bipartite graph on $χ(G)$ vertices as a minor, where $χ(G)$ is the chromatic number of $G$. In this paper, we prove that for each positive integer $\ell$ with $2\ell \leq χ(G)$, each graph $G$ with independence number two contains a $K^{\ell}_{\ell,χ(G)-\ell}$-minor, implying that Seymour…
▽ More
Woodall (and Seymour independently) in 2001 proposed a conjecture that every graph $G$ contains every complete bipartite graph on $χ(G)$ vertices as a minor, where $χ(G)$ is the chromatic number of $G$. In this paper, we prove that for each positive integer $\ell$ with $2\ell \leq χ(G)$, each graph $G$ with independence number two contains a $K^{\ell}_{\ell,χ(G)-\ell}$-minor, implying that Seymour and Woodall's conjecture holds for graphs with independence number two, where $K^{\ell}_{\ell,χ(G)-\ell}$ is the graph obtained from $K_{\ell,χ(G)-\ell}$ by making every pair of vertices on the side of the bipartition of size $\ell$ adjacent.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
HAMLET: Graph Transformer Neural Operator for Partial Differential Equations
Authors:
Andrey Bryutkin,
Jiahao Huang,
Zhongying Deng,
Guang Yang,
Carola-Bibiane Schönlieb,
Angelica Aviles-Rivero
Abstract:
We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to…
▽ More
We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Boosting Gradient Ascent for Continuous DR-submodular Maximization
Authors:
Qixin Zhang,
Zongqi Wan,
Zengde Deng,
Zaiyi Chen,
Xiaoming Sun,
Jialin Zhang,
Yu Yang
Abstract:
Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficient…
▽ More
Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $γ$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-γ})$-approximation than the $(γ^2/(1+γ^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min_{\boldsymbol{x}\in\mathcal{C}}\|\boldsymbol{x}\|_{\infty}}{4}$-approximation guarantee where $\mathcal{C}$ is a convex constraint set. In contrast, the stationary points of the original non-monotone DR-submodular function can be arbitrarily bad~\citep{chen2023continuous}. Furthermore, we demonstrate the scalability of our boosting technique on four problems. In all of these four problems, our resulting variants of boosting PGA algorithm beat the previous standard PGA in several aspects such as approximation ratio and efficiency. Finally, we corroborate our theoretical findings with numerical experiments, which demonstrate the effectiveness of our boosting PGA methods.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
An Augmented Lagrangian Primal-Dual Semismooth Newton Method for Multi-Block Composite Optimization
Authors:
Zhanwang Deng,
Kangkang Deng,
Jiang Hu,
Zaiwen Wen
Abstract:
In this paper, we develop a novel primal-dual semismooth Newton method for solving linearly constrained multi-block convex composite optimization problems. First, a differentiable augmented Lagrangian (AL) function is constructed by utilizing the Moreau envelopes of the nonsmooth functions. It enables us to derive an equivalent saddle point problem and establish the strong AL duality under the Sla…
▽ More
In this paper, we develop a novel primal-dual semismooth Newton method for solving linearly constrained multi-block convex composite optimization problems. First, a differentiable augmented Lagrangian (AL) function is constructed by utilizing the Moreau envelopes of the nonsmooth functions. It enables us to derive an equivalent saddle point problem and establish the strong AL duality under the Slater's condition. Consequently, a semismooth system of nonlinear equations is formulated to characterize the optimality of the original problem instead of the inclusion-form KKT conditions. We then develop a semismooth Newton method, called ALPDSN, which uses purely second-order steps and a nonmonotone line search based globalization strategy. Through a connection to the inexact first-order steps when the regularization parameter is sufficiently large, the global convergence of ALPDSN is established. Under the regularity conditions, partial smoothness, the local error bound, and the strict complementarity, we show that both the primal and the dual iteration sequences possess a superlinear convergence rate and provide concrete examples where these regularity conditions are met. Numerical results on the image restoration with two regularization terms and the corrected tensor nuclear norm problem are presented to demonstrate the high efficiency and robustness of our ALPDSN.
△ Less
Submitted 15 May, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
New semidefinite relaxations for a class of complex quadratic programming problems
Authors:
Yingzhe Xu,
Cheng Lu,
Zhibin Deng,
Ya-Feng Liu
Abstract:
In this paper, we propose some new semidefinite relaxations for a class of nonconvex complex quadratic programming problems, which widely appear in the areas of signal processing and power system. By deriving new valid constraints to the matrix variables in the lifted space, we derive some enhanced semidefinite relaxations of the complex quadratic programming problems. Then, we compare the propose…
▽ More
In this paper, we propose some new semidefinite relaxations for a class of nonconvex complex quadratic programming problems, which widely appear in the areas of signal processing and power system. By deriving new valid constraints to the matrix variables in the lifted space, we derive some enhanced semidefinite relaxations of the complex quadratic programming problems. Then, we compare the proposed semidefinite relaxations with existing ones and show that the newly proposed semidefinite relaxations could be strictly tighter than the previous ones. Moreover, the proposed semidefinite relaxations can be applied to more general cases of complex quadratic programming problems, whereas the previous ones are only designed for special cases. Numerical results indicate that the proposed semidefinite relaxations not only provide tighter relaxation bounds but also improve some existing approximation algorithms by finding better sub-optimal solutions.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Residual-Based Multi-peak Sampling Algorithm in Inverse Problems of Dynamical Systems
Authors:
Xiao-Kai An,
Lin Du,
Zi-Chen Deng,
Yu-jia Zhang
Abstract:
Stochastic differential equations can describe a wide range of dynamical systems, and obtaining the governing equations of these systems is the premise of studying the nonlinear dynamic behavior of the system. Neural networks are currently the most popular approach in the inverse problem of dynamical systems. In order to obtain accurate dynamical equations, neural networks need a large amount of t…
▽ More
Stochastic differential equations can describe a wide range of dynamical systems, and obtaining the governing equations of these systems is the premise of studying the nonlinear dynamic behavior of the system. Neural networks are currently the most popular approach in the inverse problem of dynamical systems. In order to obtain accurate dynamical equations, neural networks need a large amount of trajectory data as a training set. To address this shortcoming, we propose a residual-based multi-peaks sampling algorithm. Evaluate the training results of each epoch of neural network, calculate the probability density function $P(x)$ of the residual, perform sampling where the $P(x)$ is large, and add samples to the training set to retrain the neural network. In order to prevent the neural network from falling into the trap of overfitting, we discretize the sampling points. We conduct case studies using two classical nonlinear dynamical systems and perform bifurcation and first escape probability analyzes of the fitted equations. Results show that our proposed sampling strategy requires only 20$\sim $30\% of the sample points of the original method to reconstruct the stochastic dynamical behavior of the system. Finally, the algorithm is tested by adding interference noise to the data, and the results show that the sampling strategy has better numerical robustness and stability.
△ Less
Submitted 24 April, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Decentralized Weakly Convex Optimization Over the Stiefel Manifold
Authors:
**xin Wang,
Jiang Hu,
Shixiang Chen,
Zengde Deng,
Anthony Man-Cho So
Abstract:
We focus on a class of non-smooth optimization problems over the Stiefel manifold in the decentralized setting, where a connected network of $n$ agents cooperatively minimize a finite-sum objective function with each component being weakly convex in the ambient Euclidean space. Such optimization problems, albeit frequently encountered in applications, are quite challenging due to their non-smoothn…
▽ More
We focus on a class of non-smooth optimization problems over the Stiefel manifold in the decentralized setting, where a connected network of $n$ agents cooperatively minimize a finite-sum objective function with each component being weakly convex in the ambient Euclidean space. Such optimization problems, albeit frequently encountered in applications, are quite challenging due to their non-smoothness and non-convexity. To tackle them, we propose an iterative method called the decentralized Riemannian subgradient method (DRSM). The global convergence and an iteration complexity of $\mathcal{O}(\varepsilon^{-2} \log^2(\varepsilon^{-1}))$ for forcing a natural stationarity measure below $\varepsilon$ are established via the powerful tool of proximal smoothness from variational analysis, which could be of independent interest. Besides, we show the local linear convergence of the DRSM using geometrically diminishing stepsizes when the problem at hand further possesses a sharpness property. Numerical experiments are conducted to corroborate our theoretical findings.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
An Online Algorithm for Chance Constrained Resource Allocation
Authors:
Yuwei Chen,
Zengde Deng,
Yinzhi Zhou,
Zaiyi Chen,
Yujie Chen,
Haoyuan Hu
Abstract:
This paper studies the online stochastic resource allocation problem (RAP) with chance constraints. The online RAP is a 0-1 integer linear programming problem where the resource consumption coefficients are revealed column by column along with the corresponding revenue coefficients. When a column is revealed, the corresponding decision variables are determined instantaneously without future inform…
▽ More
This paper studies the online stochastic resource allocation problem (RAP) with chance constraints. The online RAP is a 0-1 integer linear programming problem where the resource consumption coefficients are revealed column by column along with the corresponding revenue coefficients. When a column is revealed, the corresponding decision variables are determined instantaneously without future information. Moreover, in online applications, the resource consumption coefficients are often obtained by prediction. To model their uncertainties, we take the chance constraints into the consideration. To the best of our knowledge, this is the first time chance constraints are introduced in the online RAP problem. Assuming that the uncertain variables have known Gaussian distributions, the stochastic RAP can be transformed into a deterministic but nonlinear problem with integer second-order cone constraints. Next, we linearize this nonlinear problem and analyze the performance of vanilla online primal-dual algorithm for solving the linearized stochastic RAP. Under mild technical assumptions, the optimality gap and constraint violation are both on the order of $\sqrt{n}$. Then, to further improve the performance of the algorithm, several modified online primal-dual algorithms with heuristic corrections are proposed. Finally, extensive numerical experiments on both synthetic and real data demonstrate the applicability and effectiveness of our methods.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization
Authors:
Qixin Zhang,
Zengde Deng,
Xiangru Jian,
Zaiyi Chen,
Haoyuan Hu,
Yu Yang
Abstract:
Maximizing a monotone submodular function is a fundamental task in machine learning, economics, and statistics. In this paper, we present two communication-efficient decentralized online algorithms for the monotone continuous DR-submodular maximization problem, both of which reduce the number of per-function gradient evaluations and per-round communication complexity from $T^{3/2}$ to $1$. The fir…
▽ More
Maximizing a monotone submodular function is a fundamental task in machine learning, economics, and statistics. In this paper, we present two communication-efficient decentralized online algorithms for the monotone continuous DR-submodular maximization problem, both of which reduce the number of per-function gradient evaluations and per-round communication complexity from $T^{3/2}$ to $1$. The first one, One-shot Decentralized Meta-Frank-Wolfe (Mono-DMFW), achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. As far as we know, this is the first one-shot and projection-free decentralized online algorithm for monotone continuous DR-submodular maximization. Next, inspired by the non-oblivious boosting function \citep{zhang2022boosting}, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm, which attains a $(1-1/e)$-regret of $O(\sqrt{T})$. To the best of our knowledge, this is the first result to obtain the optimal $O(\sqrt{T})$ against a $(1-1/e)$-approximation with only one gradient inquiry for each local objective function per step. Finally, various experimental results confirm the effectiveness of the proposed methods.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback
Authors:
Qixin Zhang,
Zengde Deng,
Zaiyi Chen,
Kuangqi Zhou,
Haoyuan Hu,
Yu Yang
Abstract:
In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far…
▽ More
In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Robustness Implies Generalization via Data-Dependent Generalization Bounds
Authors:
Kenji Kawaguchi,
Zhun Deng,
Kyle Luh,
Jiaoyang Huang
Abstract:
This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second…
▽ More
This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second is to remove the dependence on the hypothesis space. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. The experiments on real-world data and theoretical models demonstrate near-exponential improvements in various situations. To achieve these improvements, we do not require additional assumptions on the unknown distribution; instead, we only incorporate an observable and computable property of the training samples. A key technical innovation is an improved concentration bound for multinomial random variables that is of independent interest beyond robustness and generalization.
△ Less
Submitted 3 August, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Bayesian approach for limited-aperture inverse acoustic scattering with total variation prior
Authors:
Xiao-Mei Yang,
Zhi-Liang Deng,
Ailin Qian
Abstract:
In this work, we apply the Bayesian approach for the acoustic scattering problem to reconstruct the shape of a sound-soft obstacle using the limited-aperture far-field measure data. A novel total variation prior is assigned to the shape parameterization form. This prior is imposed on the Fourier coefficients of the parameterized form of the obstacle. Extensive numerical tests are provided to illus…
▽ More
In this work, we apply the Bayesian approach for the acoustic scattering problem to reconstruct the shape of a sound-soft obstacle using the limited-aperture far-field measure data. A novel total variation prior is assigned to the shape parameterization form. This prior is imposed on the Fourier coefficients of the parameterized form of the obstacle. Extensive numerical tests are provided to illustrate the numerical performance.
△ Less
Submitted 19 March, 2022;
originally announced April 2022.
-
Online Primal-Dual Algorithms For Stochastic Resource Allocation Problems
Authors:
Yuwei Chen,
Zengde Deng,
Zaiyi Chen,
Yinzhi Zhou,
Yujie Chen,
Haoyuan Hu
Abstract:
This paper studies the online stochastic resource allocation problem (RAP) with chance constraints and conditional expectation constraints. The online RAP is an integer linear programming problem where resource consumption coefficients are revealed column by column along with the corresponding revenue coefficients. When a column is revealed, the corresponding decision variables are determined inst…
▽ More
This paper studies the online stochastic resource allocation problem (RAP) with chance constraints and conditional expectation constraints. The online RAP is an integer linear programming problem where resource consumption coefficients are revealed column by column along with the corresponding revenue coefficients. When a column is revealed, the corresponding decision variables are determined instantaneously without future information. In online applications, the resource consumption coefficients are often obtained by prediction. An application for such scenario rises from the online order fulfilment task. When the timeliness constraints are considered, the coefficients are generated by the prediction for the transportation time from origin to destination. To model their uncertainties, we take the chance constraints and conditional expectation constraints into the consideration. Assuming that the uncertain variables have known Gaussian distributions, the stochastic RAP can be transformed into a deterministic but nonlinear problem with integer second-order cone constraints. Next, we linearize this nonlinear problem and theoretically analyze the performance of vanilla online primal-dual algorithm for solving the linearized stochastic RAP. Under mild technical assumptions, the optimality gap and constraint violation are both on the order of $\sqrt{n}$. Then, to further improve the performance of the algorithm, several modified online primal-dual algorithms with heuristic corrections are proposed. Finally, extensive numerical experiments demonstrate the applicability and effectiveness of our methods.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Bayesian inverse problems using homotopy
Authors:
Xiao-Mei Yang,
Zhi-Liang Deng
Abstract:
In solving Bayesian inverse problems, it is often desirable to use a common density parameterization to denote the prior and posterior. Typically we seek a density from the same family as the prior which closely approximates the true posterior. As one of the most important classes of distributions in statistics, the exponential family is considered as the parameterization. The optimal parameter va…
▽ More
In solving Bayesian inverse problems, it is often desirable to use a common density parameterization to denote the prior and posterior. Typically we seek a density from the same family as the prior which closely approximates the true posterior. As one of the most important classes of distributions in statistics, the exponential family is considered as the parameterization. The optimal parameter values for representing the approximated posterior are achieved by minimizing the deviation between the parameterized density and a homotopy that deforms the prior density into the posterior density. Rather than trying to solve the original problem, it is exactly converted into a corresponding system of explicit ordinary first-order differential equations. Solving this system over a finite 'time' interval yields the desired optimal density parameters. This method is proven to be effective by some numerical examples.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function
Authors:
Qixin Zhang,
Zengde Deng,
Zaiyi Chen,
Haoyuan Hu,
Yu Yang
Abstract:
In this paper, we revisit Stochastic Continuous Submodular Maximization in both offline and online settings, which can benefit wide applications in machine learning and operations research areas. We present a boosting framework covering gradient ascent and online gradient ascent. The fundamental ingredient of our methods is a novel non-oblivious function $F$ derived from a factor-revealing optimiz…
▽ More
In this paper, we revisit Stochastic Continuous Submodular Maximization in both offline and online settings, which can benefit wide applications in machine learning and operations research areas. We present a boosting framework covering gradient ascent and online gradient ascent. The fundamental ingredient of our methods is a novel non-oblivious function $F$ derived from a factor-revealing optimization problem, whose any stationary point provides a $(1-e^{-γ})$-approximation to the global maximum of the $γ$-weakly DR-submodular objective function $f\in C^{1,1}_L(\mathcal{X})$. Under the offline scenario, we propose a boosting gradient ascent method achieving $(1-e^{-γ}-ε^{2})$-approximation after $O(1/ε^2)$ iterations, which improves the $(\frac{γ^2}{1+γ^2})$ approximation ratio of the classical gradient ascent algorithm. In the online setting, for the first time we consider the adversarial delays for stochastic gradient feedback, under which we propose a boosting online gradient algorithm with the same non-oblivious function $F$. Meanwhile, we verify that this boosting online algorithm achieves a regret of $O(\sqrt{D})$ against a $(1-e^{-γ})$-approximation to the best feasible solution in hindsight, where $D$ is the sum of delays of gradient feedback. To the best of our knowledge, this is the first result to obtain $O(\sqrt{T})$ regret against a $(1-e^{-γ})$-approximation with $O(1)$ gradient inquiry at each time step, when no delay exists, i.e., $D=T$. Finally, numerical experiments demonstrate the effectiveness of our boosting methods.
△ Less
Submitted 10 June, 2022; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Understanding Dynamics of Nonlinear Representation Learning and Its Application
Authors:
Kenji Kawaguchi,
Linjun Zhang,
Zhun Deng
Abstract:
Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network…
▽ More
Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss in common practical regimes of deep learning, unlike the neural tangent kernel (NTK) regime. In this paper, we study the dynamics of such implicit nonlinear representation learning, which is beyond the NTK regime. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework based on the theory. The proposed framework is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with standard benchmark datasets, including CIFAR-10, CIFAR-100, and SVHN.
△ Less
Submitted 9 April, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
A Bayesian level set method for an inverse medium scattering problem in acoustics
Authors:
J. Huang,
Z. Deng,
L. Xu
Abstract:
In this work, we are interested in the determination of the shape of the scatterer for the two dimensional time harmonic inverse medium scattering problems in acoustics. The scatterer is assumed to be a piecewise constant function with a known value inside inhomogeneities, and its shape is represented by the level set functions for which we investigate the information using the Bayesian method. In…
▽ More
In this work, we are interested in the determination of the shape of the scatterer for the two dimensional time harmonic inverse medium scattering problems in acoustics. The scatterer is assumed to be a piecewise constant function with a known value inside inhomogeneities, and its shape is represented by the level set functions for which we investigate the information using the Bayesian method. In the Bayesian framework, the solution of the geometric inverse problem is defined as a posterior probability distribution. The well-posedness of the posterior distribution would be discussed, and the Markov chain Monte Carlo (MCMC) methods will be applied to generate samples from the arising posterior distribution. Numerical experiments will be presented to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Sparse High-Order Portfolios via Proximal DCA and SCA
Authors:
**xin Wang,
Zengde Deng,
Taoli Zheng,
Anthony Man-Cho So
Abstract:
In this paper, we aim at solving the cardinality constrained high-order portfolio optimization, i.e., mean-variance-skewness-kurtosis model with cardinality constraint (MVSKC). Optimization for the MVSKC model is of great difficulty in two parts. One is that the objective function is non-convex, the other is the combinational nature of the cardinality constraint, leading to non-convexity as well d…
▽ More
In this paper, we aim at solving the cardinality constrained high-order portfolio optimization, i.e., mean-variance-skewness-kurtosis model with cardinality constraint (MVSKC). Optimization for the MVSKC model is of great difficulty in two parts. One is that the objective function is non-convex, the other is the combinational nature of the cardinality constraint, leading to non-convexity as well dis-continuity. Based on the observation that cardinality constraint has the difference-of-convex (DC) property, we transform the cardinality constraint into a penalty term and then propose three algorithms including the proximal difference of convex algorithm (pDCA), pDCA with extrapolation (pDCAe) and the successive convex approximation (SCA) to handle the resulting penalized MVSK (PMVSK) formulation. Moreover, theoretical convergence results of these algorithms are established respectively. Numerical experiments on the real datasets demonstrate the superiority of our proposed methods in obtaining high utility and sparse solutions as well as efficiency in terms of time usage.
△ Less
Submitted 10 June, 2021; v1 submitted 29 August, 2020;
originally announced August 2020.
-
Manifold Proximal Point Algorithms for Dual Principal Component Pursuit and Orthogonal Dictionary Learning
Authors:
Shixiang Chen,
Zengde Deng,
Shiqian Ma,
Anthony Man-Cho So
Abstract:
We consider the problem of maximizing the $\ell_1$ norm of a linear map over the sphere, which arises in various machine learning applications such as orthogonal dictionary learning (ODL) and robust subspace recovery (RSR). The problem is numerically challenging due to its nonsmooth objective and nonconvex constraint, and its algorithmic aspects have not been well explored. In this paper, we show…
▽ More
We consider the problem of maximizing the $\ell_1$ norm of a linear map over the sphere, which arises in various machine learning applications such as orthogonal dictionary learning (ODL) and robust subspace recovery (RSR). The problem is numerically challenging due to its nonsmooth objective and nonconvex constraint, and its algorithmic aspects have not been well explored. In this paper, we show how the manifold structure of the sphere can be exploited to design fast algorithms for tackling this problem. Specifically, our contribution is threefold. First, we present a manifold proximal point algorithm (ManPPA) for the problem and show that it converges at a sublinear rate. Furthermore, we show that ManPPA can achieve a quadratic convergence rate when applied to the ODL and RSR problems. Second, we propose a stochastic variant of ManPPA called StManPPA, which is well suited for large-scale computation, and establish its sublinear convergence rate. Both ManPPA and StManPPA have provably faster convergence rates than existing subgradient-type methods. Third, using ManPPA as a building block, we propose a new approach to solving a matrix analog of the problem, in which the sphere is replaced by the Stiefel manifold. The results from our extensive numerical experiments on the ODL and RSR problems demonstrate the efficiency and efficacy of our proposed methods.
△ Less
Submitted 21 July, 2021; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods
Authors:
Xiao Li,
Shixiang Chen,
Zengde Deng,
Qing Qu,
Zhihui Zhu,
Anthony Man Cho So
Abstract:
We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient metho…
▽ More
We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient methods -- to solve these problems and show that they all have an iteration complexity of ${\cal O}(\varepsilon^{-4})$ for driving a natural stationarity measure below $\varepsilon$. In addition, we establish the local linear convergence of the Riemannian subgradient and incremental subgradient methods when the problem at hand further satisfies a sharpness property and the algorithms are properly initialized and use geometrically diminishing stepsizes. To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold. The fundamental ingredient in the proof of the aforementioned convergence results is a new Riemannian subgradient inequality for restrictions of weakly convex functions on the Stiefel manifold, which could be of independent interest. We also show that our convergence results can be extended to handle a class of compact embedded submanifolds of the Euclidean space. Finally, we discuss the sharpness properties of various formulations of the robust subspace recovery and orthogonal dictionary learning problems and demonstrate the convergence performance of the algorithms on both problems via numerical simulations.
△ Less
Submitted 24 March, 2021; v1 submitted 12 November, 2019;
originally announced November 2019.
-
An ensemble Kalman filter approach based on level set parameterization for acoustic source identification using multiple frequency information
Authors:
Zhiliang Deng,
Xiaomei Yang
Abstract:
The spatial dependent unknown acoustic source is reconstructed according noisy multiple frequency data on a remote closed surface. Assume that the unknown function is supported on a bounded domain. To determine the support, we present a statistical inversion algorithm, which combines the ensemble Kalman filter approach with level set technique. Several numerical examples show that the proposed met…
▽ More
The spatial dependent unknown acoustic source is reconstructed according noisy multiple frequency data on a remote closed surface. Assume that the unknown function is supported on a bounded domain. To determine the support, we present a statistical inversion algorithm, which combines the ensemble Kalman filter approach with level set technique. Several numerical examples show that the proposed method give good numerical reconstruction.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
A parametric Bayesian level set approach for acoustic source identification using multiple frequency information
Authors:
Zhiliang Deng,
Xiaomei Yang,
Jiangfeng Huang
Abstract:
The reconstruction of the unknown acoustic source is studied using the noisy multiple frequency data on a remote closed surface. Assume that the unknown source is coded in a spatial dependent piecewise constant function, whose support set is the target to be determined. In this setting, the unknown source can be formalized by a level set function. The function is explored with Bayesian level set a…
▽ More
The reconstruction of the unknown acoustic source is studied using the noisy multiple frequency data on a remote closed surface. Assume that the unknown source is coded in a spatial dependent piecewise constant function, whose support set is the target to be determined. In this setting, the unknown source can be formalized by a level set function. The function is explored with Bayesian level set approach. To reduce the infinite dimensional problem to finite dimension, we parameterize the level set function by the radial basis expansion. The well-posedness of the posterior distribution is proven. The posterior samples are generated according to the Metropolis-Hastings algorithm and the sample mean is used to approximate the unknown. Several shapes are tested to verify the effectiveness of the proposed algorithm. These numerical results show that the proposed algorithm is feasible and competitive with the Matérn random field for the acoustic source problem.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Bayesian approach for inverse obstacle scattering with Poisson data
Authors:
Xiaomei Yang,
Zhiliang Deng
Abstract:
We consider an acoustic obstacle reconstruction problem with Poisson data. Due to the stochastic nature of the data, we tackle this problem in the framework of Bayesian inversion. The unknown obstacle is parameterized in its angular form. The prior for the parameterized unknown plays key role in the Bayes reconstruction algorithm. The most popular used prior is the Gaussian. Under the Gaussian pri…
▽ More
We consider an acoustic obstacle reconstruction problem with Poisson data. Due to the stochastic nature of the data, we tackle this problem in the framework of Bayesian inversion. The unknown obstacle is parameterized in its angular form. The prior for the parameterized unknown plays key role in the Bayes reconstruction algorithm. The most popular used prior is the Gaussian. Under the Gaussian prior assumption, we further suppose that the unknown satisfies the total variation prior. With the hybrid prior, the well-posedness of the posterior distribution is discussed. The numerical examples verify the effectiveness of the proposed algorithm.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Limited Aperture Inverse Scattering Problems using Bayesian Approach and Extended Sampling Method
Authors:
Zhaoxiang Li,
Zhiliang Deng,
Jiguang Sun
Abstract:
Inverse scattering problems have many important applications. In this paper, given limited aperture data, we propose a Bayesian method for the inverse acoustic scattering to reconstruct the shape of an obstacle. The inverse problem is formulated as a statistical model using the Baye's formula. The well-posedness is proved in the sense of the Hellinger metric. The extended sampling method is modifi…
▽ More
Inverse scattering problems have many important applications. In this paper, given limited aperture data, we propose a Bayesian method for the inverse acoustic scattering to reconstruct the shape of an obstacle. The inverse problem is formulated as a statistical model using the Baye's formula. The well-posedness is proved in the sense of the Hellinger metric. The extended sampling method is modified to provide the initial guess of the target location, which is critical to the fast convergence of the MCMC algorithm. An extensive numerical study is presented to illustrate the performance of the proposed method.
△ Less
Submitted 29 May, 2019;
originally announced May 2019.
-
An Efficient Augmented Lagrangian Based Method for Constrained Lasso
Authors:
Zengde Deng,
Anthony Man-Cho So
Abstract:
Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, the constrained Lasso model has been proposed in the literature. In this paper, we present an inexact augmented Lagrangian method to solve the Lasso problem with linear equality constraints. By fully exploiting second-order sparsity of t…
▽ More
Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, the constrained Lasso model has been proposed in the literature. In this paper, we present an inexact augmented Lagrangian method to solve the Lasso problem with linear equality constraints. By fully exploiting second-order sparsity of the problem, we are able to greatly reduce the computational cost and obtain highly efficient implementations. Furthermore, numerical results on both synthetic data and real data show that our algorithm is superior to existing first-order methods in terms of both running time and solution accuracy.
△ Less
Submitted 12 March, 2019;
originally announced March 2019.
-
Q-Hermite polynomials chaos approximation of likelihood function based on q-Gaussian prior in Bayesian inversion
Authors:
Zhiliang Deng,
Xiaomei Yang
Abstract:
In real applications, the construction of prior and acceleration of sampling for posterior are usually two key points of Bayesian inversion algorithm for engineers. In this paper, q-analogy of Gaussian distribution, q-Gaussian distribution, is introduced as the prior of inverse problems. And an acceleration algorithm based on spectral likelihood approximation is discussed. We mainly focus on the c…
▽ More
In real applications, the construction of prior and acceleration of sampling for posterior are usually two key points of Bayesian inversion algorithm for engineers. In this paper, q-analogy of Gaussian distribution, q-Gaussian distribution, is introduced as the prior of inverse problems. And an acceleration algorithm based on spectral likelihood approximation is discussed. We mainly focus on the convergence of the posterior distribution in the sense of Kullback-Leibler divergence when approximated likelihood function and truncated prior distribution are used. Moreover, the convergence in the sense of total variation and Hellinger metric is obtained. In the end two numerical examples are displayed.
△ Less
Submitted 2 August, 2018;
originally announced August 2018.
-
Optimal Output Consensus of High-Order Multi-Agent Systems with Embedded Technique
Authors:
Yutao Tang,
Zhenhua Deng,
Yiguang Hong
Abstract:
In this paper, we study an optimal output consensus problem for a multi-agent network with agents in the form of multi-input multi-output minimum-phase dynamics. Optimal output consensus can be taken as an extended version of the existing output consensus problem for higher-order agents with an optimization requirement, where the output variables of agents are driven to achieve a consensus on the…
▽ More
In this paper, we study an optimal output consensus problem for a multi-agent network with agents in the form of multi-input multi-output minimum-phase dynamics. Optimal output consensus can be taken as an extended version of the existing output consensus problem for higher-order agents with an optimization requirement, where the output variables of agents are driven to achieve a consensus on the optimal solution of a global cost function. To solve this problem, we first construct an optimal signal generator, and then propose an embedded control scheme by embedding the generator in the feedback loop. We give two kinds of algorithms based on different available information along with both state feedback and output feedback, and prove that these algorithms with the embedded technique can guarantee the solvability of the problem for high-order multi-agent systems under standard assumptions.
△ Less
Submitted 21 August, 2018; v1 submitted 15 April, 2017;
originally announced April 2017.
-
Stability analysis of the numerical Method of characteristics applied to a class of energy-preserving systems. Part II: Nonreflecting boundary conditions
Authors:
Taras I. Lakoba,
Zihao Deng
Abstract:
We show that imposition of non-periodic, in place of periodic, boundary conditions (BC) can alter stability of modes in the Method of characteristics (MoC) employing certain ordinary-differential equation (ODE) numerical solvers. Thus, using non-periodic BC may render some of the MoC schemes stable for most practical computations, even though they are unstable for periodic BC. This fact contradict…
▽ More
We show that imposition of non-periodic, in place of periodic, boundary conditions (BC) can alter stability of modes in the Method of characteristics (MoC) employing certain ordinary-differential equation (ODE) numerical solvers. Thus, using non-periodic BC may render some of the MoC schemes stable for most practical computations, even though they are unstable for periodic BC. This fact contradicts a statement, found in some literature, that an instability detected by the von Neumann analysis for a given numerical scheme implies an instability of that scheme with arbitrary (i.e., non-periodic) BC. We explain the mechanism behind this contradiction. We also show that, and explain why, for the MoC employing some other ODE solvers, stability of the modes may be unaffected by the BC.
△ Less
Submitted 27 July, 2017; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Stability analysis of the numerical Method of characteristics applied to a class of energy-preserving systems. Part I: Periodic boundary conditions
Authors:
Taras I. Lakoba,
Zihao Deng
Abstract:
We study numerical (in)stability of the Method of characteristics (MoC) applied to a system of non-dissipative hyperbolic partial differential equations (PDEs) with periodic boundary conditions. We consider three different solvers along the characteristics: simple Euler (SE), modified Euler (ME), and Leap-frog (LF). The two former solvers are well known to exhibit a mild, but unconditional, numeri…
▽ More
We study numerical (in)stability of the Method of characteristics (MoC) applied to a system of non-dissipative hyperbolic partial differential equations (PDEs) with periodic boundary conditions. We consider three different solvers along the characteristics: simple Euler (SE), modified Euler (ME), and Leap-frog (LF). The two former solvers are well known to exhibit a mild, but unconditional, numerical instability for non-dissipative ordinary differential equations (ODEs). They are found to have a similar (or stronger, for the MoC-ME) instability when applied to non-dissipative PDEs. On the other hand, the LF solver is known to be stable when applied to non-dissipative ODEs. However, when applied to non-dissipative PDEs within the MoC framework, it was found to have by far the strongest instability among all three solvers. We also comment on the use of the fourth-order Runge--Kutta solver within the MoC framework.
△ Less
Submitted 27 July, 2017; v1 submitted 28 October, 2016;
originally announced October 2016.
-
An inverse problem of identifying the radiative coefficient in a degenerate parabolic equation
Authors:
Zui-Cha Deng,
Liu Yang
Abstract:
This work investigates an inverse problem of determining the radiative coefficient in a degenerate parabolic equation from the final overspecified data. Being different from other inverse coefficient problems in which the principle coefficients are assumed to be strictly positive definite, the mathematical model discussed in the paper belongs to the second order parabolic equations with non-negati…
▽ More
This work investigates an inverse problem of determining the radiative coefficient in a degenerate parabolic equation from the final overspecified data. Being different from other inverse coefficient problems in which the principle coefficients are assumed to be strictly positive definite, the mathematical model discussed in the paper belongs to the second order parabolic equations with non-negative characteristic form, namely that there exists degeneracy on the lateral boundaries of the domain. The uniqueness of the solution is obtained by the contraction map** principle. Based on the optimal control framework, the problem is transformed into an optimization problem and the existence of the minimizer is established. After the necessary conditions which must be satisfied by the minimizer are deduced, the uniqueness and stability of the minimizer are proved. By minor modification of the cost functional and some \emph{a-priori} regularity conditions imposed on the forward operator, the convergence of the minimizer for the noisy input data is obtained in the paper. The results obtained in the paper are interesting and useful, and can be extended to more general degenerate parabolic equations.
△ Less
Submitted 27 September, 2013;
originally announced September 2013.
-
Factorization from an order-theoretic view 1&2
Authors:
Zike Deng
Abstract:
Drawing inspiration from Emmy Noether'set-theoretic foundations for algebra and Charles Ehresmann's topology without points, we adopt a new order-theoretic approach to ideal theory. For this we emphasize the order of divisibility in factorization and use it as a medium for relating algebra to topology 1. Replacing principal ideals and their intersections by equivalence classes and their collection…
▽ More
Drawing inspiration from Emmy Noether'set-theoretic foundations for algebra and Charles Ehresmann's topology without points, we adopt a new order-theoretic approach to ideal theory. For this we emphasize the order of divisibility in factorization and use it as a medium for relating algebra to topology 1. Replacing principal ideals and their intersections by equivalence classes and their collections respectively, we transform integral divisorial ideals into B-ideals in order to provide an order-theoretic frame for treating decomposition dispensing with addition. The idea of a B-ideal is connected closely with generalized-algebraicty originated from semantics for programme languages. 2. Since B-ideals constitute a complete lattice, we can utilize the fact that decomposition, which means that each element can be decomposed into the join of all elements way-below it, is equivalent to complete distributivity. B-ideals with decomposition theorems in themselves do not depend on algebraic structures and can be applied to any poset 3. Closed-set lattice is cotopology based on multiplication and independent of a partioular prime in the sense of pointless topology by Ehresmann. It differs from Zariski topology in using prime-powers rather than primes so that multiplicity in algebra acquires geometric meaning. 4. Factorial group is also a free module with multiplication instead of addition. Hence poset-theoretic constructions have corresponding algebraic analoques. They are introduced based on Noether's set-theoretic approach but quotient is within like a subset rather than outside.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.