-
Random pairing MLE for estimation of item parameters in Rasch model
Authors:
Yuepeng Yang,
Cong Ma
Abstract:
The Rasch model, a classical model in the item response theory, is widely used in psychometrics to model the relationship between individuals' latent traits and their binary responses on assessments or questionnaires. In this paper, we introduce a new likelihood-based estimator -- random pairing maximum likelihood estimator ($\mathsf{RP\text{-}MLE}$) and its bootstrapped variant multiple random pa…
▽ More
The Rasch model, a classical model in the item response theory, is widely used in psychometrics to model the relationship between individuals' latent traits and their binary responses on assessments or questionnaires. In this paper, we introduce a new likelihood-based estimator -- random pairing maximum likelihood estimator ($\mathsf{RP\text{-}MLE}$) and its bootstrapped variant multiple random pairing MLE ($\mathsf{MRP\text{-}MLE}$) that faithfully estimate the item parameters in the Rasch model. The new estimators have several appealing features compared to existing ones. First, both work for sparse observations, an increasingly important scenario in the big data era. Second, both estimators are provably minimax optimal in terms of finite sample $\ell_{\infty}$ estimation error. Lastly, $\mathsf{RP\text{-}MLE}$ admits precise distributional characterization that allows uncertainty quantification on the item parameters, e.g., construction of confidence intervals of the item parameters. The main idea underlying $\mathsf{RP\text{-}MLE}$ and $\mathsf{MRP\text{-}MLE}$ is to randomly pair user-item responses to form item-item comparisons. This is carefully designed to reduce the problem size while retaining statistical independence. We also provide empirical evidence of the efficacy of the two new estimators using both simulated and real data.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
CPAFT: A Consistent Parallel Advancing Front Technique for Unstructured Triangular/Tetrahedral Mesh Generation
Authors:
Chengdi Ma,
Jizu Huang,
Hao Luo,
Chao Yang
Abstract:
Compared with the remarkable progress made in parallel numerical solvers of partial differential equations,the development of algorithms for generating unstructured triangular/tetrahedral meshes has been relatively sluggish. In this paper, we propose a novel, consistent parallel advancing front technique (CPAFT) by combining the advancing front technique, the domain decomposition method based on s…
▽ More
Compared with the remarkable progress made in parallel numerical solvers of partial differential equations,the development of algorithms for generating unstructured triangular/tetrahedral meshes has been relatively sluggish. In this paper, we propose a novel, consistent parallel advancing front technique (CPAFT) by combining the advancing front technique, the domain decomposition method based on space-filling curves, the distributed forest-of-overlap**-trees approach, and the consistent parallel maximal independent set algorithm. The newly proposed CPAFT algorithm can mathematically ensure that the generated unstructured triangular/tetrahedral meshes are independent of the number of processors and the implementation of domain decomposition. Several numerical tests are conducted to validate the parallel consistency and outstanding parallel efficiency of the proposed algorithm, which scales effectively up to two thousand processors. This is, as far as we know, the first parallel unstructured triangular/tetrahedral mesh generator with scalability to O(1,000) CPU processors.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
A new framework of high-order unfitted finite element methods using ALE maps for moving-domain problems
Authors:
Wenhao Lu,
Chuwen Ma,
Weiying Zheng
Abstract:
As a sequel to our previous work [C. Ma, Q. Zhang and W. Zheng, SIAM J. Numer. Anal., 60 (2022)], [C. Ma and W. Zheng, J. Comput. Phys. 469 (2022)], this paper presents a generic framework of arbitrary Lagrangian-Eulerian unfitted finite element (ALE-UFE) methods for partial differential equations (PDEs) on time-varying domains. The ALE-UFE method has a great potential in develo** high-order unf…
▽ More
As a sequel to our previous work [C. Ma, Q. Zhang and W. Zheng, SIAM J. Numer. Anal., 60 (2022)], [C. Ma and W. Zheng, J. Comput. Phys. 469 (2022)], this paper presents a generic framework of arbitrary Lagrangian-Eulerian unfitted finite element (ALE-UFE) methods for partial differential equations (PDEs) on time-varying domains. The ALE-UFE method has a great potential in develo** high-order unfitted finite element methods. The usefulness of the method is demonstrated by a variety of moving-domain problems, including a linear problem with explicit velocity of the boundary (or interface), a PDE-domain coupled problem, and a problem whose domain has a topological change. Numerical experiments show that optimal convergence is achieved by both third- and fourth-order methods on domains with smooth boundaries, but is deteriorated to the second order when the domain has topological changes.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Schrödingerisation based computationally stable algorithms for ill-posed problems in partial differential equations
Authors:
Shi **,
Nana Liu,
Chuwen Ma
Abstract:
We introduce a simple and stable computational method for ill-posed partial differential equation (PDE) problems. The method is based on Schrödingerization, introduced in [S. **, N. Liu and Y. Yu, Phys. Rev. A, 108 (2023), 032603], which maps all linear PDEs into Schrödinger-type equations in one higher dimension, for quantum simulations of these PDEs. Although the original problem is ill-posed,…
▽ More
We introduce a simple and stable computational method for ill-posed partial differential equation (PDE) problems. The method is based on Schrödingerization, introduced in [S. **, N. Liu and Y. Yu, Phys. Rev. A, 108 (2023), 032603], which maps all linear PDEs into Schrödinger-type equations in one higher dimension, for quantum simulations of these PDEs. Although the original problem is ill-posed, the Schrödingerized equations are Hamiltonian systems and time-reversible, allowing stable computation both forward and backward in time. The original variable can be recovered by data from suitably chosen domain in the extended dimension. We will use the backward heat equation and the linear convection equation with imaginary wave speed as examples. Error analysis of these algorithms are conducted and verified numerically. The methods are applicable to both classical and quantum computers, and we also lay out quantum algorithms for these methods. Moreover, we introduce a smooth initialization for the Schrödingerized equation which will lead to essentially spectral accuracy for the approximation in the extended space, if a spectral method is used. Consequently, the extra qubits needed due to the extra dimension, if a qubit based quantum algorithm is used, for both well-posed and ill-posed problems, becomes almost $\log\log {1/\varepsilon}$ where $\varepsilon$ is the desired precision. This optimizes the complexity of the Schrödingerization based quantum algorithms for any non-unitary dynamical system introduced in [S. **, N. Liu and Y. Yu, Phys. Rev. A, 108 (2023), 032603].
△ Less
Submitted 8 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
A Mixed Multiscale Spectral Generalized Finite Element Method
Authors:
Christian Alber,
Chupeng Ma,
Robert Scheichl
Abstract:
We present a multiscale mixed finite element method for solving second order elliptic equations with general $L^{\infty}$-coefficients arising from flow in highly heterogeneous porous media. Our approach is based on a multiscale spectral generalized finite element method (MS-GFEM) and exploits the superior local mass conservation properties of mixed finite elements. Following the MS-GFEM framework…
▽ More
We present a multiscale mixed finite element method for solving second order elliptic equations with general $L^{\infty}$-coefficients arising from flow in highly heterogeneous porous media. Our approach is based on a multiscale spectral generalized finite element method (MS-GFEM) and exploits the superior local mass conservation properties of mixed finite elements. Following the MS-GFEM framework, optimal local approximation spaces are built for the velocity field by solving local eigenvalue problems over generalized harmonic spaces. The resulting global velocity space is then enriched suitably to ensure inf-sup stability. We develop the mixed MS-GFEM for both continuous and discrete formulations, with Raviart-Thomas based mixed finite elements underlying the discrete method. Exponential convergence with respect to local degrees of freedom is proven at both the continuous and discrete levels. Numerical results are presented to support the theory and to validate the proposed method.
△ Less
Submitted 4 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Batched Nonparametric Contextual Bandits
Authors:
Rong Jiang,
Cong Ma
Abstract:
We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In e…
▽ More
We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In essence, our procedure dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret in the fully online setting.
△ Less
Submitted 10 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
On Schrödingerization based quantum algorithms for linear dynamical systems with inhomogeneous terms
Authors:
Shi **,
Nana Liu,
Chuwen Ma
Abstract:
We analyze the Schrödingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schrödingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a h…
▽ More
We analyze the Schrödingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schrödingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a higher dimension, making them suitable for quantum simulation. This technique can also be applied to these equations with inhomogeneous terms modeling source or forcing terms or boundary and interface conditions, and discrete dynamical systems such as iterative methods in numerical linear algebra, through extra equations in the system. Difficulty airses with the presense of inhomogeneous terms since it can change the stability of the original system.
In this paper, we systematically study--both theoretically and numerically--the important issue of recovering the original variables from the Schrödingerized equations, even when the evolution operator contains unstable modes. We show that even with unstable modes, one can still construct a stable scheme, yet to recover the original variable one needs to use suitable data in the extended space. We analyze and compare both the discrete and continuous Fourier transforms used in the extended dimension, and derive corresponding error estimates, which allows one to use the more appropriate transform for specific equations. We also provide a smoother initialization for the Schrodödingerized system to gain higher order accuracy in the extended space. We homogenize the inhomogeneous terms with a stretch transformation, making it easier to recover the original variable. Our recovering technique also provides a simple and generic framework to solve general ill-posed problems in a computationally stable way.
△ Less
Submitted 27 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Top-$K$ ranking with a monotone adversary
Authors:
Yuepeng Yang,
Antares Chen,
Lorenzo Orecchia,
Cong Ma
Abstract:
In this paper, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this pap…
▽ More
In this paper, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this paper is to develop a weighted maximum likelihood estimator (MLE) that achieves near-optimal sample complexity, up to a $\log^2(n)$ factor, where $n$ denotes the number of items under comparison. This is made possible through a combination of analytical and algorithmic innovations. On the analytical front, we provide a refined~$\ell_\infty$ error analysis of the weighted MLE that is more explicit and tighter than existing analyses. It relates the~$\ell_\infty$ error with the spectral properties of the weighted comparison graph. Motivated by this, our algorithmic innovation involves the development of an SDP-based approach to reweight the semi-random graph and meet specified spectral properties. Additionally, we propose a first-order method based on the Matrix Multiplicative Weight Update (MMWU) framework. This method efficiently solves the resulting SDP in nearly-linear time relative to the size of the semi-random comparison graph.
△ Less
Submitted 20 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
On the design-dependent suboptimality of the Lasso
Authors:
Reese Pathak,
Cong Ma
Abstract:
This paper investigates the effect of the design matrix on the ability (or inability) to estimate a sparse parameter in linear regression. More specifically, we characterize the optimal rate of estimation when the smallest singular value of the design matrix is bounded away from zero. In addition to this information-theoretic result, we provide and analyze a procedure which is simultaneously stati…
▽ More
This paper investigates the effect of the design matrix on the ability (or inability) to estimate a sparse parameter in linear regression. More specifically, we characterize the optimal rate of estimation when the smallest singular value of the design matrix is bounded away from zero. In addition to this information-theoretic result, we provide and analyze a procedure which is simultaneously statistically optimal and computationally efficient, based on soft thresholding the ordinary least squares estimator. Most surprisingly, we show that the Lasso estimator -- despite its widespread adoption for sparse linear regression -- is provably minimax rate-suboptimal when the minimum singular value is small. We present a family of design matrices and sparse parameters for which we can guarantee that the Lasso with any choice of regularization parameter -- including those which are data-dependent and randomized -- would fail in the sense that its estimation rate is suboptimal by polynomial factors in the sample size. Our lower bound is strong enough to preclude the statistical optimality of all forms of the Lasso, including its highly popular penalized, norm-constrained, and cross-validated variants.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Information-Theoretic Thresholds for Planted Dense Cycles
Authors:
Cheng Mao,
Alexander S. Wein,
Shenduo Zhang
Abstract:
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth $n τ$, representing the hidden one-dimensional geometry of vertices, is planted in an ambient random graph on $n$ vertices. For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds i…
▽ More
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth $n τ$, representing the hidden one-dimensional geometry of vertices, is planted in an ambient random graph on $n$ vertices. For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $τ$, and an edge-wise signal-to-noise ratio $λ$. In particular, the information-theoretic thresholds differ from the computational thresholds established in a recent work for low-degree polynomial algorithms, thereby justifying the existence of statistical-to-computational gaps for this problem.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Invariants of Quantizations of Unimodular Quadratic Polynomial Poisson Algebras of Dimension 3
Authors:
Chengyuan Ma
Abstract:
Let $P = \Bbbk[x_1, x_2, x_3]$ be a unimodular quadratic Poisson algebra, with its Poisson bracket written as $\{x_i, x_j\} = \displaystyle{\sum_{k,l}c_{i,j}^{k,l}x_kx_l}$, $1 \leq i < j \leq 3$. Let $P_{\hbar}$ be the deformation quantization of $P$ constructed as follows:…
▽ More
Let $P = \Bbbk[x_1, x_2, x_3]$ be a unimodular quadratic Poisson algebra, with its Poisson bracket written as $\{x_i, x_j\} = \displaystyle{\sum_{k,l}c_{i,j}^{k,l}x_kx_l}$, $1 \leq i < j \leq 3$. Let $P_{\hbar}$ be the deformation quantization of $P$ constructed as follows: $P_{\hbar} = \Bbbk\langle y_1, y_2, y_3\rangle/([y_i,y_j]=\frac{\hbar}{2}\displaystyle{\sum_{k,l}}c_{i,j}^{k,l}(y_ky_l+y_ly_k))_{1 \leq i < j \leq 3}$. In this paper, we establish that $P$ and $P_{\hbar}$ possess identical graded automorphisms and reflections, and that taking invariant subalgebras and taking deformation quantizations are two commutative processes.
△ Less
Submitted 24 January, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift
Authors:
Jiawei Ge,
Shange Tang,
Jianqing Fan,
Cong Ma,
Chi **
Abstract:
A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addr…
▽ More
A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs
Authors:
Chupeng Ma
Abstract:
This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed f…
▽ More
This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton.
Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\logε|^{d})$-term separable approximation on well-separated domains with error $ε>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\logε|^{d+1})$-term separable approximation was proved for Poisson-type problems.
△ Less
Submitted 15 March, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization
Authors:
Cong Ma,
Xingyu Xu,
Tian Tong,
Yuejie Chi
Abstract:
Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor factorization, one of the most popular approaches is to employ simple iterative algorithms such as gradient descent (GD) to recover the low-rank factors directly, which…
▽ More
Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor factorization, one of the most popular approaches is to employ simple iterative algorithms such as gradient descent (GD) to recover the low-rank factors directly, which allow for small memory and computation footprints. However, the convergence rate of GD depends linearly, and sometimes even quadratically, on the condition number of the low-rank object, and therefore, GD slows down painstakingly when the problem is ill-conditioned. This chapter introduces a new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that provably converges linearly at a constant rate independent of the condition number of the low-rank object, while maintaining the low per-iteration cost of gradient descent for a variety of tasks including sensing, robust principal component analysis and completion. In addition, ScaledGD continues to admit fast global convergence to the minimax-optimal solution, again almost independent of the condition number, from a small random initialization when the rank is over-specified in the presence of Gaussian noise. In total, ScaledGD highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
On conformally flat cubic metrics with weakly isotropic scalar curvature
Authors:
Cuiling Ma,
Xiaoling Zhang
Abstract:
The conformal properties of metrics are meaningful in Riemannian and Finsler geometry, and cubic metrics are useful in physics and biology. In this paper, we study the conformally flat cubic metrics with weakly isotropic scalar curvature. We also prove that such metrics must be Minkowski metrics.
The conformal properties of metrics are meaningful in Riemannian and Finsler geometry, and cubic metrics are useful in physics and biology. In this paper, we study the conformally flat cubic metrics with weakly isotropic scalar curvature. We also prove that such metrics must be Minkowski metrics.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Quantum simulation of Maxwell's equations via Schrödingersation
Authors:
Shi **,
Nana Liu,
Chuwen Ma
Abstract:
We present quantum algorithms for electromagnetic fields governed by Maxwell's equations. The algorithms are based on the Schrödingersation approach, which transforms any linear PDEs and ODEs with non-unitary dynamics into a system evolving under unitary dynamics, via a warped phase transformation that maps the equation into one higher dimension. In this paper, our quantum algorithms are based on…
▽ More
We present quantum algorithms for electromagnetic fields governed by Maxwell's equations. The algorithms are based on the Schrödingersation approach, which transforms any linear PDEs and ODEs with non-unitary dynamics into a system evolving under unitary dynamics, via a warped phase transformation that maps the equation into one higher dimension. In this paper, our quantum algorithms are based on either a direct approximation of Maxwell's equations combined with Yee's algorithm, or a matrix representation in terms of Riemann-Silberstein vectors combined with a spectral approach and an upwind scheme. We implement these algorithms with physical boundary conditions, including perfect conductor and impedance boundaries. We also solve Maxwell's equations for a linear inhomogeneous medium, specifically the interface problem. Several numerical experiments are performed to demonstrate the validity of this approach. In addition, instead of qubits, the quantum algorithms can also be formulated in the continuous variable quantum framework, which allows the quantum simulation of Maxwell's equations in analog quantum simulation.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
On the Defocusing Cubic Nonlinear Wave Equation on $\mathbb{H}^3$ with Radial Initial Data in $H^{\frac{1}{2}+δ} \times H^{-\frac{1}{2}+δ}$
Authors:
Chutian Ma
Abstract:
In this paper we prove global well-posedness and scattering for the defocusing cubic nonlinear wave equation in the hyperbolic space $\mathbb{H}^3$, under the assumption that the initial data is radial and lies in $H^{\frac{1}{2}+δ}(\mathbb{H}^3)\times H^{-\frac{1}{2}+δ}(\mathbb{H}^3)$
In this paper we prove global well-posedness and scattering for the defocusing cubic nonlinear wave equation in the hyperbolic space $\mathbb{H}^3$, under the assumption that the initial data is radial and lies in $H^{\frac{1}{2}+δ}(\mathbb{H}^3)\times H^{-\frac{1}{2}+δ}(\mathbb{H}^3)$
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Authors:
Yu Gui,
Cong Ma,
Yiqiao Zhong
Abstract:
We investigate the role of projection heads, also known as projectors, within the encoder-projector framework (e.g., SimCLR) used in contrastive learning. We aim to demystify the observed phenomenon where representations learned before projectors outperform those learned after -- measured using the downstream linear classification accuracy, even when the projectors themselves are linear.
In this…
▽ More
We investigate the role of projection heads, also known as projectors, within the encoder-projector framework (e.g., SimCLR) used in contrastive learning. We aim to demystify the observed phenomenon where representations learned before projectors outperform those learned after -- measured using the downstream linear classification accuracy, even when the projectors themselves are linear.
In this paper, we make two significant contributions towards this aim. Firstly, through empirical and theoretical analysis, we identify two crucial effects -- expansion and shrinkage -- induced by the contrastive loss on the projectors. In essence, contrastive loss either expands or shrinks the signal direction in the representations learned by an encoder, depending on factors such as the augmentation strength, the temperature used in contrastive loss, etc. Secondly, drawing inspiration from the expansion and shrinkage phenomenon, we propose a family of linear transformations to accurately model the projector's behavior. This enables us to precisely characterize the downstream linear classification accuracy in the high-dimensional asymptotic limit. Our findings reveal that linear projectors operating in the shrinkage (or expansion) regime hinder (or improve) the downstream classification accuracy. This provides the first theoretical explanation as to why (linear) projectors impact the downstream performance of learned representations. Our theoretical findings are further corroborated by extensive experiments on both synthetic data and real image data.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
High-probability sample complexities for policy evaluation with linear function approximation
Authors:
Gen Li,
Weichen Wu,
Yuejie Chi,
Cong Ma,
Alessandro Rinaldo,
Yuting Wei
Abstract:
This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale li…
▽ More
This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.
△ Less
Submitted 2 May, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks
Authors:
Mingze Wang,
Chao Ma
Abstract:
The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent K…
▽ More
The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.
△ Less
Submitted 27 December, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Detection of Dense Subhypergraphs by Low-Degree Polynomials
Authors:
Abhishek Dhawan,
Cheng Mao,
Alexander S. Wein
Abstract:
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem that has been extensively studied in recent years. We study a hypergraph version of the problem. Let $G^r(n,p)$ denote the $r$-uniform Erdős-Rényi hypergraph model with $n$ vertices and edge density $p$. We consider detecting the presence of a planted $G^r(n^γ, n^{-α})$ subhypergraph in a…
▽ More
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem that has been extensively studied in recent years. We study a hypergraph version of the problem. Let $G^r(n,p)$ denote the $r$-uniform Erdős-Rényi hypergraph model with $n$ vertices and edge density $p$. We consider detecting the presence of a planted $G^r(n^γ, n^{-α})$ subhypergraph in a $G^r(n, n^{-β})$ hypergraph, where $0< α< β< r-1$ and $0 < γ< 1$. Focusing on tests that are degree-$n^{o(1)}$ polynomials of the entries of the adjacency tensor, we determine the threshold between the easy and hard regimes for the detection problem. More precisely, for $0 < γ< 1/2$, the threshold is given by $α= βγ$, and for $1/2 \le γ< 1$, the threshold is given by $α= β/2 + r(γ- 1/2)$.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known. Our proof of low-degree hardness is based on a conditional variant of the standard low-degree likelihood calculation.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
On the Fourier Truncation Method for the Rough Data Cubic Defocusing NLW on $\mathbb{H}^3$
Authors:
Chutian Ma
Abstract:
In this paper, we study the cubic defocusing nonlinear wave equation on the three dimensional hyperbolic space. We use the Fourier truncation method to show that the equation is globally well-posed and scatters if the initial data lies in $H^s(\mathbb{H}^3)$, $s>\frac{182}{201}\approx 0.905$.
In this paper, we study the cubic defocusing nonlinear wave equation on the three dimensional hyperbolic space. We use the Fourier truncation method to show that the equation is globally well-posed and scatters if the initial data lies in $H^s(\mathbb{H}^3)$, $s>\frac{182}{201}\approx 0.905$.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Invariants of Unimodular Quadratic Polynomial Poisson Algebras of Dimension 3
Authors:
Chengyuan Ma
Abstract:
Let $P = \Bbbk[x1,x2,x3]$ be a unimodular quadratic Poisson algebra and let $G$ be a finite subgroup of the graded Poisson automorphism group of $P$. In this paper, we prove a variant of the Shephard-Todd-Chevalley theorem for $P$ and variants the Shephard-Todd-Chevalley theorem and the Watanabe theorem for its Poisson envelo** algebra $U(P)$ under the induced group $\widetilde{G}$.
Let $P = \Bbbk[x1,x2,x3]$ be a unimodular quadratic Poisson algebra and let $G$ be a finite subgroup of the graded Poisson automorphism group of $P$. In this paper, we prove a variant of the Shephard-Todd-Chevalley theorem for $P$ and variants the Shephard-Todd-Chevalley theorem and the Watanabe theorem for its Poisson envelo** algebra $U(P)$ under the induced group $\widetilde{G}$.
△ Less
Submitted 3 April, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Sharp analysis of EM for learning mixtures of pairwise differences
Authors:
Abhishek Dhawan,
Cheng Mao,
Ashwin Pananjady
Abstract:
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation err…
▽ More
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
△ Less
Submitted 22 June, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Computing persistent homology by spanning trees and critical simplices
Authors:
Dinghua Shi,
Zhifeng Chen,
Chuang Ma,
Guanrong Chen
Abstract:
Topological data analysis can extract effective information from higher-dimensional data. Its mathematical basis is persistent homology. The persistent homology can calculate topological features at different spatiotemporal scales of the dataset; that is, establishing the integrated taxonomic relation among points, lines and simplices. Here, the simplicial network composed of all-order simplices i…
▽ More
Topological data analysis can extract effective information from higher-dimensional data. Its mathematical basis is persistent homology. The persistent homology can calculate topological features at different spatiotemporal scales of the dataset; that is, establishing the integrated taxonomic relation among points, lines and simplices. Here, the simplicial network composed of all-order simplices in a simplicial complex is essential. Because the sequence of nested simplicial subnetworks can be regarded as a discrete Morse function from the simplicial network to real values, a method based on the concept of critical simplices can be developed by searching all-order spanning trees. Employing this new method, not only the Morse function values with the theoretical minimum number of critical simplices can be obtained, but also the Betti numbers and composition of all-order cavities in the simplicial network can be calculated quickly. Finally, this method is used to analyze some examples and compared with other methods, showing its effectiveness and feasibility.
△ Less
Submitted 27 September, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Detection-Recovery Gap for Planted Dense Cycles
Authors:
Cheng Mao,
Alexander S. Wein,
Shenduo Zhang
Abstract:
Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth $n τ$ and edge density $p$ is planted in an Erdős-Rényi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and rec…
▽ More
Planted dense cycles are a type of latent structure that appears in many applications, such as small-world networks in social sciences and sequence assembly in computational biology. We consider a model where a dense cycle with expected bandwidth $n τ$ and edge density $p$ is planted in an Erdős-Rényi graph $G(n,q)$. We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree polynomial algorithms. In particular, a gap exists between the two thresholds in a certain regime of parameters. For example, if $n^{-3/4} \ll τ\ll n^{-1/2}$ and $p = C q = Θ(1)$ for a constant $C>1$, the detection problem is computationally easy while the recovery problem is hard for low-degree algorithms.
△ Less
Submitted 20 June, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
Authors:
Xingyu Xu,
Yandi Shen,
Yuejie Chi,
Cong Ma
Abstract:
We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning t…
▽ More
We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
△ Less
Submitted 6 November, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Quasi Non-Negative Quaternion Matrix Factorization with Application to Color Face Recognition
Authors:
Yifen Ke,
Changfeng Ma,
Zhigang Jia,
Yajun Xie,
Riwei Liao
Abstract:
To address the non-negativity dropout problem of quaternion models, a novel quasi non-negative quaternion matrix factorization (QNQMF) model is presented for color image processing. To implement QNQMF, the quaternion projected gradient algorithm and the quaternion alternating direction method of multipliers are proposed via formulating QNQMF as the non-convex constraint quaternion optimization pro…
▽ More
To address the non-negativity dropout problem of quaternion models, a novel quasi non-negative quaternion matrix factorization (QNQMF) model is presented for color image processing. To implement QNQMF, the quaternion projected gradient algorithm and the quaternion alternating direction method of multipliers are proposed via formulating QNQMF as the non-convex constraint quaternion optimization problems. Some properties of the proposed algorithms are studied. The numerical experiments on the color image reconstruction show that these algorithms encoded on the quaternion perform better than these algorithms encoded on the red, green and blue channels. Furthermore, we apply the proposed algorithms to the color face recognition. Numerical results indicate that the accuracy rate of face recognition on the quaternion model is better than on the red, green and blue channels of color image as well as single channel of gray level images for the same data, when large facial expressions and shooting angle variations are presented.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Scalable multiscale-spectral GFEM with an application to composite aero-structures
Authors:
Jean Bénézech,
Linus Seelinger,
Peter Bastian,
Richard Butler,
Timothy Dodwell,
Chupeng Ma,
Robert Scheichl
Abstract:
In this paper, the first large-scale application of multiscale-spectral generalized finite element methods (MS-GFEM) to composite aero-structures is presented. The crucial novelty lies in the introduction of A-harmonicity in the local approximation spaces, which in contrast to [Babuska, Lipton, Multiscale Model. Simul. 9, 2011] is enforced more efficiently via a constraint in the local eigenproble…
▽ More
In this paper, the first large-scale application of multiscale-spectral generalized finite element methods (MS-GFEM) to composite aero-structures is presented. The crucial novelty lies in the introduction of A-harmonicity in the local approximation spaces, which in contrast to [Babuska, Lipton, Multiscale Model. Simul. 9, 2011] is enforced more efficiently via a constraint in the local eigenproblems. This significant modification leads to excellent approximation properties, which turn out to be essential to capture accurately material strains and stresses with a low dimensional approximation space, hence maximising model order reduction. The implementation of the framework in the DUNE software package, as well as a detailed description of all components of the method are presented and exemplified on a composite laminated beam under compressive loading. The excellent parallel scalability of the method, as well as its superior performance compared to the related, previously introduced GenEO method are demonstrated on two realistic application cases, including a C-shaped wing spar with complex geometry. Further, by allowing low-cost approximate solves for closely related models or geometries this efficient, novel technology provides the basis for future applications in optimisation or uncertainty quantification on challenging problems in composite aero-structures.
△ Less
Submitted 1 March, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Multilevel-in-Layer Training for Deep Neural Network Regression
Authors:
Colin Ponce,
Ruipeng Li,
Christina Mao,
Panayot Vassilevski
Abstract:
A common challenge in regression is that for many problems, the degrees of freedom required for a high-quality solution also allows for overfitting. Regularization is a class of strategies that seek to restrict the range of possible solutions so as to discourage overfitting while still enabling good solutions, and different regularization strategies impose different types of restrictions. In this…
▽ More
A common challenge in regression is that for many problems, the degrees of freedom required for a high-quality solution also allows for overfitting. Regularization is a class of strategies that seek to restrict the range of possible solutions so as to discourage overfitting while still enabling good solutions, and different regularization strategies impose different types of restrictions. In this paper, we present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks, each of which has layers that are wider versions of the previous network's layers. We draw intuition and techniques from the field of Algebraic Multigrid (AMG), traditionally used for solving linear and nonlinear systems of equations, and specifically adapt the Full Approximation Scheme (FAS) for nonlinear systems of equations to the problem of deep learning. Training through V-cycles then encourage the neural networks to build a hierarchical understanding of the problem. We refer to this approach as \emph{multilevel-in-width} to distinguish from prior multilevel works which hierarchically alter the depth of neural networks. The resulting approach is a highly flexible framework that can be applied to a variety of layer types, which we demonstrate with both fully-connected and convolutional layers. We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer, improving the generalize performance of the neural networks studied.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
A Scattering Result of the Radial Cubic Defocusing Schrödinger Equation on the 3d Hyperbolic Space
Authors:
Chutian Ma
Abstract:
In this paper, we study the defocusing cubic Schrödinger equation on three dimensional hyperbolic space $\mathbb{H}^3$ with radial initial data in the Sobolev Space $H^s(0<s<1)$. Our main result is that the initial value problem is globally wellposed and scatters for $\frac{15}{16}<s<1$. This is an extension of the work of Staffilani and Yu to the three dimensional hyperbolic space.
In this paper, we study the defocusing cubic Schrödinger equation on three dimensional hyperbolic space $\mathbb{H}^3$ with radial initial data in the Sobolev Space $H^s(0<s<1)$. Our main result is that the initial value problem is globally wellposed and scatters for $\frac{15}{16}<s<1$. This is an extension of the work of Staffilani and Yu to the three dimensional hyperbolic space.
△ Less
Submitted 26 October, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Lyapunov Function Consistent Adaptive Network Signal Control with Back Pressure and Reinforcement Learning
Authors:
Chaolun Ma,
Bruce Wang,
Zihao Li,
Ahmadreza Mahmoudzadeh,
Yunlong Zhang
Abstract:
In traffic signal control, flow-based (optimizing the overall flow) and pressure-based methods (equalizing and alleviating congestion) are commonly used but often considered separately. This study introduces a unified framework using Lyapunov control theory, defining specific Lyapunov functions respectively for these methods. We have found interesting results. For example, the well-recognized back…
▽ More
In traffic signal control, flow-based (optimizing the overall flow) and pressure-based methods (equalizing and alleviating congestion) are commonly used but often considered separately. This study introduces a unified framework using Lyapunov control theory, defining specific Lyapunov functions respectively for these methods. We have found interesting results. For example, the well-recognized back-pressure method is equal to differential queue lengths weighted by intersection lane saturation flows. We further improve it by adding basic traffic flow theory. Rather than ensuring that the control system be stable, the system should be also capable of adaptive to various performance metrics. Building on insights from Lyapunov theory, this study designs a reward function for the Reinforcement Learning (RL)-based network signal control, whose agent is trained with Double Deep Q-Network (DDQN) for effective control over complex traffic networks. The proposed algorithm is compared with several traditional and RL-based methods under pure passenger car flow and heterogenous traffic flow including freight, respectively. The numerical tests demonstrate that the proposed method outperforms the alternative control methods across different traffic scenarios, covering corridor and general network situations each with varying traffic demands, in terms of the average network vehicle waiting time per vehicle.
△ Less
Submitted 16 January, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Random graph matching at Otter's threshold via counting chandeliers
Authors:
Cheng Mao,
Yihong Wu,
Jiaming Xu,
Sophie H. Yu
Abstract:
We propose an efficient algorithm for graph matching based on similarity scores constructed from counting a certain family of weighted trees rooted at each vertex. For two Erdős-Rényi graphs $\mathcal{G}(n,q)$ whose edges are correlated through a latent vertex correspondence, we show that this algorithm correctly matches all but a vanishing fraction of the vertices with high probability, provided…
▽ More
We propose an efficient algorithm for graph matching based on similarity scores constructed from counting a certain family of weighted trees rooted at each vertex. For two Erdős-Rényi graphs $\mathcal{G}(n,q)$ whose edges are correlated through a latent vertex correspondence, we show that this algorithm correctly matches all but a vanishing fraction of the vertices with high probability, provided that $nq\to\infty$ and the edge correlation coefficient $ρ$ satisfies $ρ^2>α\approx 0.338$, where $α$ is Otter's tree-counting constant. Moreover, this almost exact matching can be made exact under an extra condition that is information-theoretically necessary. This is the first polynomial-time graph matching algorithm that succeeds at an explicit constant correlation and applies to both sparse and dense graphs. In comparison, previous methods either require $ρ=1-o(1)$ or are restricted to sparse graphs.
The crux of the algorithm is a carefully curated family of rooted trees called chandeliers, which allows effective extraction of the graph correlation from the counts of the same tree while suppressing the undesirable correlation between those of different trees.
△ Less
Submitted 13 February, 2023; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Exponential convergence of a generalized FEM for heterogeneous reaction-diffusion equations
Authors:
Chupeng Ma,
Jens Markus Melenk
Abstract:
A generalized finite element method is proposed for solving a heterogeneous reaction-diffusion equation with a singular perturbation parameter $\varepsilon$, based on locally approximating the solution on each subdomain by solution of a local reaction-diffusion equation and eigenfunctions of a local eigenproblem. These local problems are posed on some domains slightly larger than the subdomains wi…
▽ More
A generalized finite element method is proposed for solving a heterogeneous reaction-diffusion equation with a singular perturbation parameter $\varepsilon$, based on locally approximating the solution on each subdomain by solution of a local reaction-diffusion equation and eigenfunctions of a local eigenproblem. These local problems are posed on some domains slightly larger than the subdomains with oversampling size $δ^{\ast}$. The method is formulated at the continuous level as a direct discretization of the continuous problem and at the discrete level as a coarse-space approximation for its standard FE discretizations. Exponential decay rates for local approximation errors with respect to $δ^{\ast}/\varepsilon$ and $δ^{\ast}/h$ (at the discrete level with $h$ denoting the fine FE mesh size) and with the local degrees of freedom are established. In particular, it is shown that the method at the continuous level converges uniformly with respect to $\varepsilon$ in the standard $H^{1}$ norm, and that if the oversampling size is relatively large with respect to $\varepsilon$ and $h$ (at the discrete level), the solutions of the local reaction-diffusion equations provide good local approximations for the solution and thus the local eigenfunctions are not needed. Numerical results are provided to verify the theoretical results.
△ Less
Submitted 8 September, 2022; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Dimensions of projected sets and measures on typical self-affine sets
Authors:
De-Jun Feng,
Chiu-Hong Lo,
Cai-Yun Ma
Abstract:
Let $T_1,\ldots, T_m$ be a family of $d\times d$ invertible real matrices with $\|T_i\|<1/2$ for $1\leq i\leq m$. For ${\bf a}=(a_1,\ldots, a_m)\in \Bbb R^{md}$, let $π^{\bf a}:\; Σ=\{1,\ldots, m\}^{\Bbb N}\to \Bbb R^d$ denote the coding map associated with the affine IFS $\{T_ix+a_i\}_{i=1}^m$. We show that for every Borel probability measure $μ$ on $Σ$, each of the following dimensions (lower an…
▽ More
Let $T_1,\ldots, T_m$ be a family of $d\times d$ invertible real matrices with $\|T_i\|<1/2$ for $1\leq i\leq m$. For ${\bf a}=(a_1,\ldots, a_m)\in \Bbb R^{md}$, let $π^{\bf a}:\; Σ=\{1,\ldots, m\}^{\Bbb N}\to \Bbb R^d$ denote the coding map associated with the affine IFS $\{T_ix+a_i\}_{i=1}^m$. We show that for every Borel probability measure $μ$ on $Σ$, each of the following dimensions (lower and upper Hausdorff dimensions, lower and upper packing dimensions) of $π^{\bf a}_*μ$ is constant for $\mathcal L^{md}$-a.e.~${\bf a}\in \Bbb R^{md}$, where $π^{\bf a}_*μ$ stands for the push-forward of $μ$ by $π^{\bf a}$. In particular, we give a necessary and sufficient condition on $μ$ so that $π^{\bf a}_*μ$ is exact dimensional for $\mathcal L^{md}$-a.e.~${\bf a}\in \Bbb R^{md}$. Moreover, for every analytic set $E\subset Σ$, each of the Hausdorff, packing, lower and upper box-counting dimensions of $π^{\bf a}(E)$ is constant for $\mathcal L^{md}$-a.e.~${\bf a}\in \Bbb R^{md}$. Formal dimension formulas of these projected measures and sets are given. The Hausdorff dimensions of exceptional sets are estimated.
△ Less
Submitted 20 July, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Correcting Convexity Bias in Function and Functional Estimate
Authors:
Chao Ma,
Lexing Ying
Abstract:
A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce t…
▽ More
A general framework with a series of different methods is proposed to improve the estimate of convex function (or functional) values when only noisy observations of the true input are available. Technically, our methods catch the bias introduced by the convexity and remove this bias from a baseline estimate. Theoretical analysis are conducted to show that the proposed methods can strictly reduce the expected estimate error under mild conditions. When applied, the methods require no specific knowledge about the problem except the convexity and the evaluation of the function. Therefore, they can serve as off-the-shelf tools to obtain good estimate for a wide range of problems, including optimization problems with random objective functions or constraints, and functionals of probability distributions such as the entropy and the Wasserstein distance. Numerical experiments on a wide variety of problems show that our methods can significantly improve the quality of the estimate compared with the baseline method.
△ Less
Submitted 14 September, 2022; v1 submitted 16 August, 2022;
originally announced August 2022.
-
A dynamical system based on projection operator for solving absolute value equations associated with second-order cone
Authors:
Cairong Chen,
Dongmei Yu,
Deren Han,
Changfeng Ma
Abstract:
A new equivalent reformulation of the absolute value equations associated with second-order cone (SOCAVEs) is emphasised, from which a dynamical system based on projection operator for solving SOCAVEs is constructed. Under proper assumptions, the equilibrium points of the dynamical system exist and could be (globally) asymptotically stable. Some numerical simulations are given to show the effectiv…
▽ More
A new equivalent reformulation of the absolute value equations associated with second-order cone (SOCAVEs) is emphasised, from which a dynamical system based on projection operator for solving SOCAVEs is constructed. Under proper assumptions, the equilibrium points of the dynamical system exist and could be (globally) asymptotically stable. Some numerical simulations are given to show the effectiveness of the proposed method.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Gradient-based Bi-level Optimization for Deep Learning: A Survey
Authors:
Can Chen,
Xi Chen,
Chen Ma,
Zixuan Liu,
Xue Liu
Abstract:
Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evol…
▽ More
Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.
△ Less
Submitted 9 July, 2023; v1 submitted 24 July, 2022;
originally announced July 2022.
-
Scalable Model-based Policy Optimization for Decentralized Networked Systems
Authors:
Yali Du,
Chengdong Ma,
Yuchen Liu,
Runji Lin,
Hao Dong,
Jun Wang,
Yaodong Yang
Abstract:
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networke…
▽ More
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.
△ Less
Submitted 1 September, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Optimal tuning-free convex relaxation for noisy matrix completion
Authors:
Yuepeng Yang,
Cong Ma
Abstract:
This paper is concerned with noisy matrix completion--the problem of recovering a low-rank matrix from partial and noisy entries. Under uniform sampling and incoherence assumptions, we prove that a tuning-free square-root matrix completion estimator (square-root MC) achieves optimal statistical performance for solving the noisy matrix completion problem. Similar to the square-root Lasso estimator…
▽ More
This paper is concerned with noisy matrix completion--the problem of recovering a low-rank matrix from partial and noisy entries. Under uniform sampling and incoherence assumptions, we prove that a tuning-free square-root matrix completion estimator (square-root MC) achieves optimal statistical performance for solving the noisy matrix completion problem. Similar to the square-root Lasso estimator in high-dimensional linear regression, square-root MC does not rely on the knowledge of the size of the noise. While solving square-root MC is a convex program, our statistical analysis of square-root MC hinges on its intimate connections to a nonconvex rank-constrained estimator.
△ Less
Submitted 6 June, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent
Authors:
Harry Dong,
Tian Tong,
Cong Ma,
Yuejie Chi
Abstract:
An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tap** into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robu…
▽ More
An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tap** into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robust to corruptions and ill-conditioning. This paper tackles tensor robust principal component analysis (RPCA), which aims to recover a low-rank tensor from its observations contaminated by sparse corruptions, under the Tucker decomposition. To minimize the computation and memory footprints, we propose to directly recover the low-dimensional tensor factors -- starting from a tailored spectral initialization -- via scaled gradient descent (ScaledGD), coupled with an iteration-varying thresholding operation to adaptively remove the impact of corruptions. Theoretically, we establish that the proposed algorithm converges linearly to the true low-rank tensor at a constant rate that is independent with its condition number, as long as the level of corruptions is not too large. Empirically, we demonstrate that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms through synthetic experiments and real-world applications.
△ Less
Submitted 22 February, 2023; v1 submitted 18 June, 2022;
originally announced June 2022.
-
Distributed Coordination of Charging Stations Considering Aggregate EV Power Flexibility
Authors:
Dongxiang Yan,
Chengbin Ma,
Yue Chen
Abstract:
In recent years, electric vehicle (EV) charging stations have witnessed a rapid growth. However, effective management of charging stations is challenging due to individual EV owners' privacy concerns, competing interests of different stations, and the coupling distribution network constraints. To cope with this challenge, this paper proposes a two-stage scheme. In the first stage, the aggregate EV…
▽ More
In recent years, electric vehicle (EV) charging stations have witnessed a rapid growth. However, effective management of charging stations is challenging due to individual EV owners' privacy concerns, competing interests of different stations, and the coupling distribution network constraints. To cope with this challenge, this paper proposes a two-stage scheme. In the first stage, the aggregate EV power flexibility region is derived by solving an optimization problem. We prove that any trajectory within the obtained region corresponds to at least one feasible EV dispatch strategy. By submitting this flexibility region instead of the detailed EV data to the charging station operator, EV owners' privacy can be preserved and the computational burden can be reduced. In the second stage, a distributed coordination mechanism with a clear physical interpretation is developed with consideration of AC power flow constraints. We prove that the proposed mechanism is guaranteed to converge to the centralized optimum. Case studies validate the theoretical results. Comprehensive performance comparisons are carried out to demonstrate the advantages of the proposed scheme.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks
Authors:
Mingze Wang,
Chao Ma
Abstract:
The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this de…
▽ More
The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient. The results on activation patterns, which we call ``neuron partition'', help build intuitions for understanding the behavior of neural networks' training dynamics, and may be of independent interest.
△ Less
Submitted 29 May, 2023; v1 submitted 5 June, 2022;
originally announced June 2022.
-
Optimally tackling covariate shift in RKHS-based nonparametric regression
Authors:
Cong Ma,
Reese Pathak,
Martin J. Wainwright
Abstract:
We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chose…
▽ More
We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of likelihood ratios apart from an upper bound on them. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naive estimator, which minimizes the empirical risk over the function class, is strictly sub-optimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where the likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax rate-optimal, up to logarithmic factors.
△ Less
Submitted 6 June, 2023; v1 submitted 5 May, 2022;
originally announced May 2022.
-
A Survey on Machine Learning Solutions for Graph Pattern Extraction
Authors:
Kai Siong Yow,
Ningyi Liao,
Siqiang Luo,
Reynold Cheng,
Chenhao Ma,
Xiaolin Han
Abstract:
A subgraph is constructed by using a subset of vertices and edges of a given graph. There exist many graph properties that are hereditary for subgraphs. Hence, researchers from different communities have paid a great deal of attention in studying numerous subgraph problems, on top of the ordinary graph problems. Many algorithms are proposed in studying subgraph problems, where one common approach…
▽ More
A subgraph is constructed by using a subset of vertices and edges of a given graph. There exist many graph properties that are hereditary for subgraphs. Hence, researchers from different communities have paid a great deal of attention in studying numerous subgraph problems, on top of the ordinary graph problems. Many algorithms are proposed in studying subgraph problems, where one common approach is by extracting the patterns and structures of a given graph. Due to the complex structures of certain types of graphs and to improve overall performances of the existing frameworks, machine learning techniques have recently been employed in dealing with various subgraph problems. In this article, we present a comprehensive review on five well known subgraph problems that have been tackled by using machine learning methods. They are subgraph isomorphism (both counting and matching), maximum common subgraph, community detection and community search problems. We provide an outline of each proposed method, and examine its designs and performances. We also explore non-learning-based algorithms for each problem and a brief discussion is given. We then suggest some promising research directions in this area, ho** that relevant subgraph problems can be tackled by using a similar strategy. Since there is a huge growth in employing machine learning techniques in recent years, we believe that this survey will serve as a good reference point to relevant research communities.
△ Less
Submitted 2 June, 2023; v1 submitted 3 April, 2022;
originally announced April 2022.
-
A new similarity measure for covariate shift with applications to nonparametric regression
Authors:
Reese Pathak,
Cong Ma,
Martin J. Wainwright
Abstract:
We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous function…
▽ More
We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.
△ Less
Submitted 6 February, 2022;
originally announced February 2022.
-
John-Nirenberg inequalities for noncommutative column BMO and Lipschitz martingales
Authors:
Guixiang Hong,
Congbian Ma,
Yu Wang
Abstract:
In this paper, we continue the study of John-Nirenberg theorems for BMO/Lipschitz spaces in the noncommutative martingale setting. As conjectured from the classical case, a desired noncommutative ``stop** time" argument was discovered to obtain the distribution function inequality form of John-Nirenberg theorem. This not only provides another approach without using duality and interpolation to t…
▽ More
In this paper, we continue the study of John-Nirenberg theorems for BMO/Lipschitz spaces in the noncommutative martingale setting. As conjectured from the classical case, a desired noncommutative ``stop** time" argument was discovered to obtain the distribution function inequality form of John-Nirenberg theorem. This not only provides another approach without using duality and interpolation to the results for spaces $\mathsf{bmo}^c(\mathcal M)$ and ${Λ^{c}_β}(\mathcal{M})$, but also allows us to find the desired version of John-Nirenberg inequalities for spaces $\mathcal{BMO}^c(\mathcal M)$ and ${\mathcal L^{c}_β}(\mathcal{M})$. And thus we solve two open questions after \cite{ref5, ref3}. As an application, we show that Lipschitz space is also the dual space of noncommutative Hardy space defined via symmetric atoms. Finally, our results for ${\mathcal L^{c}_β}(\mathcal{M})$ as well as the approach seem new even going back to the classical setting.
△ Less
Submitted 20 May, 2023; v1 submitted 25 January, 2022;
originally announced January 2022.
-
A note on Hausdorff measures of self-similar sets in $\mathbb{R}^d$
Authors:
Cai-Yun Ma,
Yu-Feng Wu
Abstract:
We prove that for all $s\in(0,d)$ and $c\in (0,1)$ there exists a self-similar set $E\subset \mathbb{R}^d$ with Hausdorff dimension $s$ such that $\mathcal{H}^s(E)=c|E|^s$. This answers a question raised by Zhiying Wen[16].
We prove that for all $s\in(0,d)$ and $c\in (0,1)$ there exists a self-similar set $E\subset \mathbb{R}^d$ with Hausdorff dimension $s$ such that $\mathcal{H}^s(E)=c|E|^s$. This answers a question raised by Zhiying Wen[16].
△ Less
Submitted 5 January, 2022;
originally announced January 2022.
-
Strong Local Nondeterminism and Exact Modulus of Continuity for Isotropic Gaussian Random Fields on Compact Two-Point Homogeneous Spaces
Authors:
Tianshi Lu,
Chunsheng Ma,
Yimin Xiao
Abstract:
This paper is concerned with sample path properties of isotropic Gaussian fields on compact two-point homogeneous spaces. In particular, we establish the property of strong local nondeterminism of an isotropic Gaussian field based on the high-frequency behavior of its angular power spectrum, and then exploit this result to establish an exact uniform modulus of continuity for its sample paths.
This paper is concerned with sample path properties of isotropic Gaussian fields on compact two-point homogeneous spaces. In particular, we establish the property of strong local nondeterminism of an isotropic Gaussian field based on the high-frequency behavior of its angular power spectrum, and then exploit this result to establish an exact uniform modulus of continuity for its sample paths.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
A high-order unfitted finite element method for moving interface problems
Authors:
Chuwen Ma,
Weiying Zheng
Abstract:
We propose a $k^{\rm th}$-order unfitted finite element method ($2\le k\le 4$) to solve the moving interface problem of the Oseen equations. Thorough error estimates for the discrete solutions are presented by considering errors from interface-tracking, time integration, and spatial discretization. In literatures on time-dependent Stokes interface problems, error estimates for the discrete pressur…
▽ More
We propose a $k^{\rm th}$-order unfitted finite element method ($2\le k\le 4$) to solve the moving interface problem of the Oseen equations. Thorough error estimates for the discrete solutions are presented by considering errors from interface-tracking, time integration, and spatial discretization. In literatures on time-dependent Stokes interface problems, error estimates for the discrete pressure are usually sub-optimal, namely, $(k-1)^{\rm th}$-order, under the $L^2$-norm. We have obtained a $(k-1)^{\rm th}$-order error estimate for the discrete pressure under the $H^1$-norm. Numerical experiments for a severely deforming interface show that optimal convergence orders are obtained for $k = 3$ and $4$.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.