Search | arXiv e-print repository

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Authors: Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar

Abstract: Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite under… ▽ More Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite understanding their convergence in i.i.d. and heterogeneous settings and knowing the importance of batch sizes for efficiency and generalization, optimal local batch sizes are difficult to determine. We introduce adaptive batch size strategies for local gradient methods that increase batch sizes adaptively to reduce minibatch gradient variance. We provide convergence guarantees under homogeneous data conditions and support our claims with image classification experiments, demonstrating the effectiveness of our strategies in training and generalization. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.01038 [pdf, other]

Evolution of multiple closed knotted curves in space

Authors: Miroslav Kolar, Daniel Sevcovic

Abstract: We investigate a system of geometric evolution equations describing a curvature and torsion driven motion of a family of 3D curves in the normal and binormal directions. We explore the direct Lagrangian approach for treating the geometric flow of such interacting curves. Using the abstract theory of nonlinear analytic semi-flows, we are able to prove local existence, uniqueness, and continuation o… ▽ More We investigate a system of geometric evolution equations describing a curvature and torsion driven motion of a family of 3D curves in the normal and binormal directions. We explore the direct Lagrangian approach for treating the geometric flow of such interacting curves. Using the abstract theory of nonlinear analytic semi-flows, we are able to prove local existence, uniqueness, and continuation of classical Hölder smooth solutions to the governing system of non-linear parabolic equations modelling $n$ evolving curves with mutual nonlocal interactions. We present several computational studies of the flow that combine the normal or binormal velocity and considering nonlocal interaction. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.02895

MSC Class: Primary: 35K57; 35K65; 65N40; 65M08; Secondary: 53C80

arXiv:2404.06525 [pdf, ps, other]

Models of $2$-nondegenerate CR hypersurface in $\mathbb{C}^N$

Authors: Jan Gregorovič, Martin Kolář, David Sykes

Abstract: We show that every point in a uniformly $2$-nondegenerate CR hypersurface is canonically associated with a model $2$-nondegenerate structure. The $2$-nondegenerate models are basic CR invariants playing the same fundamental role as quadrics do in the Levi nondegenerate case. We characterize all $2$-nondegenerate models and show that the moduli space of such hypersurfaces in $\mathbb{C}^N$ is infin… ▽ More We show that every point in a uniformly $2$-nondegenerate CR hypersurface is canonically associated with a model $2$-nondegenerate structure. The $2$-nondegenerate models are basic CR invariants playing the same fundamental role as quadrics do in the Levi nondegenerate case. We characterize all $2$-nondegenerate models and show that the moduli space of such hypersurfaces in $\mathbb{C}^N$ is infinite dimensional for each $N>3$. We derive a normal form for these models' defining equations that is unique up to an action of a finite dimensional Lie group. We generalize recently introduced CR invariants termed modified symbols, and show how to compute these intrinsically defined invariants from a model's defining equation. We show that these models automatically possess infinitesimal symmetries spanning a complement to their Levi kernel and derive explicit formulas for them. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 36 pages; contains and extends the general theory of the first version of 2310.18588v1. arXiv admin note: substantial text overlap with arXiv:2310.18588

MSC Class: 32V05; 32V40; 53C30

arXiv:2404.02260 [pdf, other]

On diffusion and transport acting on parameterized moving closed curves in space

Authors: Michal Benes, Miroslav Kolar, Daniel Sevcovic

Abstract: We investigate the motion of closed, smooth non-self-intersecting curves that evolve in space $\mathbb{R}^3$. The geometric evolutionary equation for the evolution of the curve is accompanied by a parabolic equation for the scalar quantity evaluated over the evolving curve. We apply the direct Lagrangian approach to describe the geometric flow of 3D curves resulting in a system of degenerate parab… ▽ More We investigate the motion of closed, smooth non-self-intersecting curves that evolve in space $\mathbb{R}^3$. The geometric evolutionary equation for the evolution of the curve is accompanied by a parabolic equation for the scalar quantity evaluated over the evolving curve. We apply the direct Lagrangian approach to describe the geometric flow of 3D curves resulting in a system of degenerate parabolic equations. We prove the local existence and uniqueness of classical Hölder smooth solutions to the governing system of nonlinear parabolic equations. A numerical discretization scheme has been constructed using the method of flowing finite volumes. We present several numerical examples of the evolution of curves in 3D with a scalar quantity. In this paper, we analyze the flow of curves with no torsion evolving in rotating and parallel planes. Next, we present examples of the evolution of curves with initially knotted and unknotted curves. △ Less

Submitted 2 April, 2024; originally announced April 2024.

MSC Class: Primary: 35K57; 35K65; 65N40; 65M08; Secondary: 53C80

arXiv:2402.11215 [pdf, other]

AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

Authors: Tim Tsz-Kit Lau, Han Liu, Mladen Kolar

Abstract: The choice of batch sizes in minibatch stochastic gradient optimizers is critical in large-scale model training for both optimization and generalization performance. Although large-batch training is arguably the dominant training paradigm for large-scale deep learning due to hardware advances, the generalization performance of the model deteriorates compared to small-batch training, leading to the… ▽ More The choice of batch sizes in minibatch stochastic gradient optimizers is critical in large-scale model training for both optimization and generalization performance. Although large-batch training is arguably the dominant training paradigm for large-scale deep learning due to hardware advances, the generalization performance of the model deteriorates compared to small-batch training, leading to the so-called "generalization gap" phenomenon. To mitigate this, we investigate adaptive batch size strategies derived from adaptive sampling methods, originally developed only for stochastic gradient descent. Given the significant interplay between learning rates and batch sizes, and considering the prevalence of adaptive gradient methods in deep learning, we emphasize the need for adaptive batch size strategies in these contexts. We introduce AdAdaGrad and its scalar variant AdAdaGradNorm, which progressively increase batch sizes during training, while model updates are performed using AdaGrad and AdaGradNorm. We prove that AdAdaGradNorm converges with high probability at a rate of $\mathscr{O}(1/K)$ to find a first-order stationary point of smooth nonconvex functions within $K$ iterations. AdAdaGrad also demonstrates similar convergence properties when integrated with a novel coordinate-wise variant of our adaptive batch size strategies. We corroborate our theoretical claims by performing image classification experiments, highlighting the merits of the proposed schemes in terms of both training efficiency and model generalization. Our work unveils the potential of adaptive batch size strategies for adaptive gradient optimizers in large-scale model training. △ Less

Submitted 28 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2312.17047 [pdf, other]

Inconsistency of cross-validation for structure learning in Gaussian graphical models

Authors: Zhao Lyu, Wai Ming Tai, Mladen Kolar, Bryon Aragam

Abstract: Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probabilit… ▽ More Despite numerous years of research into the merits and trade-offs of various model selection criteria, obtaining robust results that elucidate the behavior of cross-validation remains a challenging endeavor. In this paper, we highlight the inherent limitations of cross-validation when employed to discern the structure of a Gaussian graphical model. We provide finite-sample bounds on the probability that the Lasso estimator for the neighborhood of a node within a Gaussian graphical model, optimized using a prediction oracle, misidentifies the neighborhood. Our results pertain to both undirected and directed acyclic graphs, encompassing general, sparse covariance structures. To support our theoretical findings, we conduct an empirical investigation of this inconsistency by contrasting our outcomes with other commonly used information criteria through an extensive simulation study. Given that many algorithms designed to learn the structure of graphical models require hyperparameter selection, the precise calibration of this hyperparameter is paramount for accurately estimating the inherent structure. Consequently, our observations shed light on this widely recognized practical challenge. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Preliminary version; 47 pages, 15 figures

arXiv:2305.18379 [pdf, other]

Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching

Authors: Ilgee Hong, Sen Na, Michael W. Mahoney, Mladen Kolar

Abstract: We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian… ▽ More We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian Newton system inexactly via a randomized iterative sketching solver, and select a suitable stepsize by performing line search on an exact augmented Lagrangian merit function. The randomized solvers have advantages over deterministic linear system solvers by significantly reducing per-iteration flops complexity and storage cost, when equipped with suitable sketching matrices. Our method adaptively controls the accuracy of the randomized solver and the penalty parameters of the exact augmented Lagrangian, to ensure that the inexact Newton direction is a descent direction of the exact augmented Lagrangian. This allows us to establish a global almost sure convergence. We also show that a unit stepsize is admissible locally, so that our method exhibits a local linear convergence. Furthermore, we prove that the linear convergence can be strengthened to superlinear convergence if we gradually sharpen the adaptive accuracy condition on the randomized solver. We demonstrate the superior performance of our method on benchmark nonlinear problems in CUTEst test set, constrained logistic regression with data from LIBSVM, and a PDE-constrained problem. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: 25 pages, 4 figures

Journal ref: ICML 2023

arXiv:2304.00619 [pdf, ps, other]

New examples of $2$-nondegenerate real hypersurfaces in $\mathbb{C}^N$ with arbitrary nilpotent symbols

Authors: Martin Kolář, Ilya Kossovskiy, David Sykes

Abstract: We introduce a class of uniformly $2$-nondegenerate CR hypersurfaces in $\mathbb{C}^N$, for $N>3$, having a rank $1$ Levi kernel. The class is first of all remarkable by the fact that for every $N>3$ it forms an {\em explicit} infinite-dimensional family of everywhere $2$-nondegenerate hypersurfaces. To the best of our knowledge, this is the first such construction. Besides, the class an infinite-… ▽ More We introduce a class of uniformly $2$-nondegenerate CR hypersurfaces in $\mathbb{C}^N$, for $N>3$, having a rank $1$ Levi kernel. The class is first of all remarkable by the fact that for every $N>3$ it forms an {\em explicit} infinite-dimensional family of everywhere $2$-nondegenerate hypersurfaces. To the best of our knowledge, this is the first such construction. Besides, the class an infinite-dimensional family of non-equivalent structures having a given constant nilpotent CR symbol for every such symbol. Using methods that are able to handle all cases with $N>5$ simultaneously, we solve the equivalence problem for the considered structures whose symbol is represented by a single Jordan block, classify their algebras of infinitesimal symmetries, and classify the locally homogeneous structures among them. We show that the remaining considered structures, which have symbols represented by a direct sum of Jordan blocks, can be constructed from the single block structures through simple linking and extension processes. △ Less

Submitted 25 April, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

arXiv:2303.04018 [pdf, other]

Degenerate area preserving surface Allen-Cahn equation and its sharp interface limit

Authors: Michal Benes, Miroslav Kolar, Jan M. Sischka, Axel Voigt

Abstract: We consider formal matched asymptotics to show the convergence of a degenerate area preserving surface Allen-Cahn equation to its sharp interface limit of area preserving geodesic curvature flow. The degeneracy results from a surface de Gennes-Cahn-Hilliard energy and turns out to be essential to numerically resolve the dependency of the solution on geometric properties of the surface. We experime… ▽ More We consider formal matched asymptotics to show the convergence of a degenerate area preserving surface Allen-Cahn equation to its sharp interface limit of area preserving geodesic curvature flow. The degeneracy results from a surface de Gennes-Cahn-Hilliard energy and turns out to be essential to numerically resolve the dependency of the solution on geometric properties of the surface. We experimentally demonstrate convergence of the numerical algorithm, which considers a graph formulation, adaptive finite elements and a semi-implicit discretization in time, and uses numerical solutions of the sharp interface limit, also considered in a graph formulation, as benchmark solutions. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 8 pages, 2 figures

arXiv:2211.15943 [pdf, other]

Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems

Authors: Yuchen Fang, Sen Na, Michael W. Mahoney, Mladen Kolar

Abstract: We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to th… ▽ More We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices (i.e., Hessians without modification) in SQP subproblems. As a trust-region method for constrained optimization, our algorithm must address an infeasibility issue -- the linearized equality constraints and trust-region constraints may lead to infeasible SQP subproblems. In this regard, we propose an adaptive relaxation technique to compute the trial step, consisting of a normal step and a tangential step. To control the lengths of these two steps while ensuring a scale-invariant property, we adaptively decompose the trust-region radius into two segments, based on the proportions of the rescaled feasibility and optimality residuals to the rescaled full KKT residual. The normal step has a closed form, while the tangential step is obtained by solving a trust-region subproblem, to which a solution ensuring the Cauchy reduction is sufficient for our study. We establish a global almost sure convergence guarantee for TR-StoSQP, and illustrate its empirical performance on both a subset of problems in the CUTEst test set and constrained logistic regression problems using data from the LIBSVM collection. △ Less

Submitted 28 January, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: 10 figures, 33 pages

arXiv:2208.13572 [pdf, other]

On the Lasso for Graphical Continuous Lyapunov Models

Authors: Philipp Dettling, Mathias Drton, Mladen Kolar

Abstract: Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models assume that the observations are cross-sections of independent multivariate Ornstein-Uhlenbeck processes in equilibrium. The Gaussian… ▽ More Graphical continuous Lyapunov models offer a new perspective on modeling causally interpretable dependence structure in multivariate data by treating each independent observation as a one-time cross-sectional snapshot of a temporal process. Specifically, the models assume that the observations are cross-sections of independent multivariate Ornstein-Uhlenbeck processes in equilibrium. The Gaussian equilibrium exists under a stability assumption on the drift matrix, and the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Each graphical continuous Lyapunov model assumes the drift matrix to be sparse, with a support determined by a directed graph. A natural approach to model selection in this setting is to use an $\ell_1$-regularization technique that, based on a given sample covariance matrix, seeks to find a sparse approximate solution to the Lyapunov equation. We study the model selection properties of the resulting lasso technique to arrive at a consistency result. Our detailed analysis reveals that the involved irrepresentability condition is surprisingly difficult to satisfy. While this may prevent asymptotic consistency in model selection, our numerical experiments indicate that even if the theoretical requirements for consistency are not met, the lasso approach is able to recover relevant structure of the drift matrix and is robust to aspects of model misspecification. △ Less

Submitted 15 November, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

MSC Class: 62H22; 62H12

arXiv:2201.13387 [pdf, other]

L-SVRG and L-Katyusha with Adaptive Sampling

Authors: Boxin Zhao, Boxiang Lyu, Mladen Kolar

Abstract: Stochastic gradient-based optimization methods, such as L-SVRG and its accelerated variant L-Katyusha (Kovalev et al., 2020), are widely used to train machine learning models.The theoretical and empirical performance of L-SVRG and L-Katyusha can be improved by sampling observations from a non-uniform distribution (Qian et al., 2021). However,designing a desired sampling distribution requires prior… ▽ More Stochastic gradient-based optimization methods, such as L-SVRG and its accelerated variant L-Katyusha (Kovalev et al., 2020), are widely used to train machine learning models.The theoretical and empirical performance of L-SVRG and L-Katyusha can be improved by sampling observations from a non-uniform distribution (Qian et al., 2021). However,designing a desired sampling distribution requires prior knowledge of smoothness constants, which can be computationally intractable to obtain in practice when the dimension of the model parameter is high. To address this issue, we propose an adaptive sampling strategy for L-SVRG and L-Katyusha that can learn the sampling distribution with little computational overhead, while allowing it to change with iterates, and at the same time does not require any prior knowledge of the problem parameters. We prove convergence guarantees for L-SVRG and L-Katyusha for convex objectives when the sampling distribution changes with iterates. Our results show that even without prior information, the proposed adaptive sampling strategy matches, and in some cases even surpasses, the performance of the sampling scheme in Qian et al. (2021). Extensive simulations support our theory and the practical utility of the proposed sampling scheme on real data. △ Less

Submitted 5 June, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: Published in Transactions on Machine Learning Research (03/2023)

arXiv:2201.02895 [pdf, other]

Qualitative and numerical aspects of a motion of a family of interacting curves in space

Authors: Michal Benes, Miroslav Kolar, Daniel Sevcovic

Abstract: In this article we investigate a system of geometric evolution equations describing a curvature driven motion of a family of 3D curves in the normal and binormal directions. Evolving curves may be subject of mutual interactions having both local or nonlocal character where the entire curve may influence evolution of other curves. Such an evolution and interaction can be found in applications. We e… ▽ More In this article we investigate a system of geometric evolution equations describing a curvature driven motion of a family of 3D curves in the normal and binormal directions. Evolving curves may be subject of mutual interactions having both local or nonlocal character where the entire curve may influence evolution of other curves. Such an evolution and interaction can be found in applications. We explore the direct Lagrangian approach for treating the geometric flow of such interacting curves. Using the abstract theory of nonlinear analytic semi-flows, we are able to prove local existence, uniqueness and continuation of classical Hölder smooth solutions to the governing system of nonlinear parabolic equations. Using the finite volume method, we construct an efficient numerical scheme solving the governing system of nonlinear parabolic equations. Additionally, a nontrivial tangential velocity is considered allowing for redistribution of discretization nodes. We also present several computational studies of the flow combining the normal and binormal velocity and considering nonlocal interactions. △ Less

Submitted 8 January, 2022; originally announced January 2022.

MSC Class: 35K57; 35K65; 65N40; 65M08; 53C80

arXiv:2111.03772 [pdf, other]

doi 10.1145/3508029

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

Authors: Yuwei Luo, Varun Gupta, Mladen Kolar

Abstract: We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics matrices can be arbitrary, but with a total variation, $V_T$, assumed to be $o(T)$ and unknown to the controller. Under the assumption that a sequence of stabilizing, but… ▽ More We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics matrices can be arbitrary, but with a total variation, $V_T$, assumed to be $o(T)$ and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all $t$, we present an algorithm that achieves the optimal dynamic regret of $\tilde{\mathcal{O}}\left(V_T^{2/5}T^{3/5}\right)$. With piece-wise constant dynamics, our algorithm achieves the optimal regret of $\tilde{\mathcal{O}}(\sqrt{ST})$ where $S$ is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application. △ Less

Submitted 18 March, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 1, March 2022, Article No 9, pp 1--72

arXiv:2109.11502 [pdf, other]

Inequality Constrained Stochastic Nonlinear Optimization via Active-Set Sequential Quadratic Programming

Authors: Sen Na, Mihai Anitescu, Mladen Kolar

Abstract: We study nonlinear optimization problems with a stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming (StoSQP) algorithm that utilizes a differentiable exact augmented Lagrangian as the… ▽ More We study nonlinear optimization problems with a stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming (StoSQP) algorithm that utilizes a differentiable exact augmented Lagrangian as the merit function. The algorithm adaptively selects the penalty parameters of the augmented Lagrangian and performs a stochastic line search to decide the stepsize. The global convergence is established: for any initialization, the KKT residuals converge to zero almost surely. Our algorithm and analysis further develop the prior work of Na et al., (2022). Specifically, we allow nonlinear inequality constraints without requiring the strict complementary condition; refine some of the designs in Na et al., (2022) such as the feasibility error condition and the monotonically increasing sample size; strengthen the global convergence guarantee; and improve the sample complexity on the objective Hessian. We demonstrate the performance of the designed algorithm on a subset of nonlinear problems collected in CUTEst test set and on constrained logistic regression problems. △ Less

Submitted 30 January, 2023; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: 65 pages, 9 figures

arXiv:2107.11560 [pdf, other]

A Fast Temporal Decomposition Procedure for Long-horizon Nonlinear Dynamic Programming

Authors: Sen Na, Mihai Anitescu, Mladen Kolar

Abstract: We propose a fast temporal decomposition procedure for solving long-horizon nonlinear dynamic programs. The core of the procedure is sequential quadratic programming (SQP) that utilizes a differentiable exact augmented Lagrangian as the merit function. Within each SQP iteration, we approximately solve the Newton system using an overlap** temporal decomposition strategy. We show that the approxim… ▽ More We propose a fast temporal decomposition procedure for solving long-horizon nonlinear dynamic programs. The core of the procedure is sequential quadratic programming (SQP) that utilizes a differentiable exact augmented Lagrangian as the merit function. Within each SQP iteration, we approximately solve the Newton system using an overlap** temporal decomposition strategy. We show that the approximate search direction is still a descent direction of the augmented Lagrangian, provided the overlap size and penalty parameters are suitably chosen, which allows us to establish the global convergence. Moreover, we show that a unit stepsize is accepted locally for the approximate search direction, and further establish a uniform, local linear convergence over stages. This local convergence rate matches the rate of the recent Schwarz scheme by Na et al., 2022. However, the Schwarz scheme has to solve nonlinear subproblems to optimality in each iteration, while we only perform a single Newton step instead. Numerical experiments validate our theories and demonstrate the superiority of our method. △ Less

Submitted 17 April, 2023; v1 submitted 24 July, 2021; originally announced July 2021.

Comments: 41 pages, 1 figure

arXiv:2106.10022 [pdf, other]

Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization

Authors: Luofeng Liao, Li Shen, Jia Duan, Mladen Kolar, Dacheng Tao

Abstract: Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and d… ▽ More Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and develop a communication-efficient distributed stochastic extragradient algorithm, LocalAdaSEG, with an adaptive learning rate suitable for solving convex-concave minimax problems in the Parameter-Server model. LocalAdaSEG has three main features: (i) a periodic communication strategy that reduces the communication cost between workers and the server; (ii) an adaptive learning rate that is computed locally and allows for tuning-free implementation; and (iii) theoretically, a nearly linear speed-up with respect to the dominant variance term, arising from the estimation of the stochastic gradient, is proven in both the smooth and nonsmooth convex-concave settings. LocalAdaSEG is used to solve a stochastic bilinear game, and train a generative adversarial network. We compare LocalAdaSEG against several existing optimizers for minimax problems and demonstrate its efficacy through several experiments in both homogeneous and heterogeneous settings. △ Less

Submitted 23 September, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 42 pages; Accepted to Machine Learning, 2022

arXiv:2102.05320 [pdf, other]

An Adaptive Stochastic Sequential Quadratic Programming with Differentiable Exact Augmented Lagrangians

Authors: Sen Na, Mihai Anitescu, Mladen Kolar

Abstract: We consider solving nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We assume for the objective that its evaluation, gradient, and Hessian are inaccessible, while one can compute their stochastic estimates by, for example, subsampling. We propose a stochastic algorithm based on sequential quadratic programming (SQP) that uses a differentiable exa… ▽ More We consider solving nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We assume for the objective that its evaluation, gradient, and Hessian are inaccessible, while one can compute their stochastic estimates by, for example, subsampling. We propose a stochastic algorithm based on sequential quadratic programming (SQP) that uses a differentiable exact augmented Lagrangian as the merit function. To motivate our algorithm design, we first revisit and simplify an old SQP method \citep{Lucidi1990Recursive} developed for solving deterministic problems, which serves as the skeleton of our stochastic algorithm. Based on the simplified deterministic algorithm, we then propose a non-adaptive SQP for dealing with stochastic objective, where the gradient and Hessian are replaced by stochastic estimates but the stepsizes are deterministic and prespecified. Finally, we incorporate a recent stochastic line search procedure \citep{Paquette2020Stochastic} into the non-adaptive stochastic SQP to adaptively select the random stepsizes, which leads to an adaptive stochastic SQP. The global "almost sure" convergence for both non-adaptive and adaptive SQP methods is established. Numerical experiments on nonlinear problems in CUTEst test set demonstrate the superiority of the adaptive algorithm. △ Less

Submitted 6 June, 2022; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: 60 pages, 24 figures

arXiv:2012.15274 [pdf, other]

Provably Training Overparameterized Neural Network Classifiers with Non-convex Constraints

Authors: You-Lin Chen, Zhaoran Wang, Mladen Kolar

Abstract: Training a classifier under non-convex constraints has gotten increasing attention in the machine learning community thanks to its wide range of applications such as algorithmic fairness and class-imbalanced classification. However, several recent works addressing non-convex constraints have only focused on simple models such as logistic regression or support vector machines. Neural networks, one… ▽ More Training a classifier under non-convex constraints has gotten increasing attention in the machine learning community thanks to its wide range of applications such as algorithmic fairness and class-imbalanced classification. However, several recent works addressing non-convex constraints have only focused on simple models such as logistic regression or support vector machines. Neural networks, one of the most popular models for classification nowadays, are precluded and lack theoretical guarantees. In this work, we show that overparameterized neural networks could achieve a near-optimal and near-feasible solution of non-convex constrained optimization problems via the project stochastic gradient descent. Our key ingredient is the no-regret analysis of online learning for neural networks in the overparameterization regime, which may be of independent interest in online learning applications. △ Less

Submitted 27 October, 2022; v1 submitted 30 December, 2020; originally announced December 2020.

arXiv:2006.12455 [pdf, ps, other]

Gradient-Variation Bound for Online Convex Optimization with Constraints

Authors: Shuang Qiu, Xiaohan Wei, Mladen Kolar

Abstract: We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball. As enforcing the constraints at each time step through projections is computationally challenging in general, we allow decisions to violate the functional constraints but aim to achieve a low regret and cumulative violation of the cons… ▽ More We study online convex optimization with constraints consisting of multiple functional constraints and a relatively simple constraint set, such as a Euclidean ball. As enforcing the constraints at each time step through projections is computationally challenging in general, we allow decisions to violate the functional constraints but aim to achieve a low regret and cumulative violation of the constraints over a horizon of $T$ time steps. First-order methods achieve an $\mathcal{O}(\sqrt{T})$ regret and an $\mathcal{O}(1)$ constraint violation, which is the best-known bound under the Slater's condition, but do not take into account the structural information of the problem. Furthermore, the existing algorithms and analysis are limited to Euclidean space. In this paper, we provide an \emph{instance-dependent} bound for online convex optimization with complex constraints obtained by a novel online primal-dual mirror-prox algorithm. Our instance-dependent regret is quantified by the total gradient variation $V_*(T)$ in the sequence of loss functions. The proposed algorithm works in \emph{general} normed spaces and simultaneously achieves an $\mathcal{O}(\sqrt{V_*(T)})$ regret and an $\mathcal{O}(1)$ constraint violation, which is never worse than the best-known $( \mathcal{O}(\sqrt{T}), \mathcal{O}(1) )$ result and improves over previous works that applied mirror-prox-type algorithms for this problem achieving $\mathcal{O}(T^{2/3})$ regret and constraint violation. Finally, our algorithm is computationally efficient, as it only performs mirror descent steps in each iteration instead of solving a general Lagrangian minimization problem. △ Less

Submitted 5 December, 2022; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Accepted in AAAI 2023

arXiv:2006.06782 [pdf, other]

Convergence Analysis of Accelerated Stochastic Gradient Descent under the Growth Condition

Authors: You-Lin Chen, Sen Na, Mladen Kolar

Abstract: We study the convergence of accelerated stochastic gradient descent for strongly convex objectives under the growth condition, which states that the variance of stochastic gradient is bounded by a multiplicative part that grows with the full gradient, and a constant additive part. Through the lens of the growth condition, we investigate four widely used accelerated methods: Nesterov's accelerated… ▽ More We study the convergence of accelerated stochastic gradient descent for strongly convex objectives under the growth condition, which states that the variance of stochastic gradient is bounded by a multiplicative part that grows with the full gradient, and a constant additive part. Through the lens of the growth condition, we investigate four widely used accelerated methods: Nesterov's accelerated method (NAM), robust momentum method (RMM), accelerated dual averaging method (DAM+), and implicit DAM+ (iDAM+). While these methods are known to improve the convergence rate of SGD under the condition that the stochastic gradient has bounded variance, it is not well understood how their convergence rates are affected by the multiplicative noise. In this paper, we show that these methods all converge to a neighborhood of the optimum with accelerated convergence rates (compared to SGD) even under the growth condition. In particular, NAM, RMM, iDAM+ enjoy acceleration only with a mild multiplicative noise, while DAM+ enjoys acceleration even with a large multiplicative noise. Furthermore, we propose a generic tail-averaged scheme that allows the accelerated rates of DAM+ and iDAM+ to nearly attain the theoretical lower bound (up to a logarithmic factor in the variance term). We conduct numerical experiments to support our theoretical conclusions. △ Less

Submitted 30 October, 2023; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 37 pages

arXiv:2004.13516 [pdf, ps, other]

Nonlinearizable CR automorphisms for polynomial models in $\mathbb C^N$

Authors: Martin Kolář, Francine Meylan

Abstract: We classify polynomial models for real hypersurfaces in $\mathbb C^N$, which admit nonlinearizable infinitesimal CR automorphisms. As a consequence, this provides an optimal 1-jet determination result in the general case. Further we prove that such automorphisms arise from one common source, by pulling back via a holomorphic map** a suitable symmetry of a hyperquadric in some complex space. We classify polynomial models for real hypersurfaces in $\mathbb C^N$, which admit nonlinearizable infinitesimal CR automorphisms. As a consequence, this provides an optimal 1-jet determination result in the general case. Further we prove that such automorphisms arise from one common source, by pulling back via a holomorphic map** a suitable symmetry of a hyperquadric in some complex space. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 23 pages. arXiv admin note: text overlap with arXiv:1703.07123

arXiv:1912.06875 [pdf, other]

Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

Authors: Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar

Abstract: Multi-agent reinforcement learning has been successfully applied to a number of challenging problems. Despite these empirical successes, theoretical understanding of different algorithms is lacking, primarily due to the curse of dimensionality caused by the exponential growth of the state-action space with the number of agents. We study a fundamental problem of multi-agent linear quadratic regulat… ▽ More Multi-agent reinforcement learning has been successfully applied to a number of challenging problems. Despite these empirical successes, theoretical understanding of different algorithms is lacking, primarily due to the curse of dimensionality caused by the exponential growth of the state-action space with the number of agents. We study a fundamental problem of multi-agent linear quadratic regulator (LQR) in a setting where the agents are partially exchangeable. In this setting, we develop a hierarchical actor-critic algorithm, whose computational complexity is independent of the total number of agents, and prove its global linear convergence to the optimal policy. As LQRs are often used to approximate general dynamic systems, this paper provides an important step towards a better understanding of general hierarchical mean-field multi-agent reinforcement learning. △ Less

Submitted 24 December, 2021; v1 submitted 14 December, 2019; originally announced December 2019.

arXiv:1909.05892 [pdf, other]

doi 10.1093/biomet/asaa066

Estimating Differential Latent Variable Graphical Models with Applications to Brain Connectivity

Authors: Sen Na, Mladen Kolar, Oluwasanmi Koyejo

Abstract: Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the diffe… ▽ More Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the differential network is decomposed into sparse and low-rank components, both of which are symmetric indefinite matrices. We estimate these two components simultaneously using a two-stage procedure: (i) an initialization stage, which computes a simple, consistent estimator, and (ii) a convergence stage, implemented using a projected alternating gradient descent algorithm applied to a nonconvex objective, initialized using the output of the first stage. We prove that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error. Experiments on synthetic and real data illustrate that the proposed nonconvex procedure outperforms existing methods. △ Less

Submitted 13 May, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: 60 pages

Journal ref: Biometrika 2020

arXiv:1905.06456 [pdf, ps, other]

doi 10.1007/s00209-021-02873-w

Infinitesimal symmetries of weakly pseudoconvex manifolds

Authors: Shin-Young Kim, Martin Kolar

Abstract: We classify the Lie algebras of infinitesimal CR automorphisms of weakly pseudoconvex hypersurfaces of finite multitype in $\mathbb C^N$. In particular, we prove that such manifolds admit neither nonlinear rigid automorphisms, nor real or nilpotent rotations. As a consequence, this leads to a proof of a sharp 2-jet determination result for local automorphisms. Moreover, for hypersurfaces which are… ▽ More We classify the Lie algebras of infinitesimal CR automorphisms of weakly pseudoconvex hypersurfaces of finite multitype in $\mathbb C^N$. In particular, we prove that such manifolds admit neither nonlinear rigid automorphisms, nor real or nilpotent rotations. As a consequence, this leads to a proof of a sharp 2-jet determination result for local automorphisms. Moreover, for hypersurfaces which are not balanced, CR automorphisms are uniquely determined by their 1-jets. The same classification is derived also for special models, given by sums of squares of polynomials. In particular, in the case of homogeneous polynomials the Lie algebra of infinitesimal CR automorphisms is always three graded. The results provide an important necessary step for solving the local equivalence problem on weakly pseudoconvex manifolds. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: 14 pages

Journal ref: Mathematische Zeitschrift 300, 2022

arXiv:1905.05629 [pdf, other]

A complete normal form for everywhere Levi degenerate hypersurfaces in $\mathbb C^{3}$

Authors: Martin Kolar, Ilya Kossovskiy

Abstract: $2$-nondegenerate real hypersurfaces in complex manifolds play an important role in CR-geometry and the theory of Hermitian Symmetric Domains. In this paper, we construct a complete convergent normal form for everywhere $2$-nondegenerate real-analytic hypersurfaces in complex $3$-space. We do so by develo** the homological approach of Chern-Moser in the $2… ▽ More $2$-nondegenerate real hypersurfaces in complex manifolds play an important role in CR-geometry and the theory of Hermitian Symmetric Domains. In this paper, we construct a complete convergent normal form for everywhere $2$-nondegenerate real-analytic hypersurfaces in complex $3$-space. We do so by develo** the homological approach of Chern-Moser in the $2$-nondegenerate setting. This seems to be the first such construction for hypersurfaces of infinite Catlin multitype. Our approach is based on using a rational (nonpolynomial) model for everywhere $2$-nondegenerate hypersurfaces, which is the local realization due to Fels-Kaup of the well known tube over the light cone. As an application, we obtain, in the spirit of Chern-Moser theory, a criterion for the local sphericity (i.e. local equivalence to the model) for a $2$-nondegenerate hypersurface in terms of its normal form. As another application, we obtain an explicit description of the moduli space of everywhere $2$-nondegenerate hypersurfaces. △ Less

Submitted 11 July, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

arXiv:1811.10790 [pdf, other]

High-dimensional Index Volatility Models via Stein's Identity

Authors: Sen Na, Mladen Kolar

Abstract: We study the estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein's identities, we develop methods that are applicable for the estimation of the variance index in the high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in the low-di… ▽ More We study the estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein's identities, we develop methods that are applicable for the estimation of the variance index in the high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in the low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in the high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions. △ Less

Submitted 25 May, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: 44 pages

arXiv:1703.07123 [pdf, ps, other]

Nonlinear CR automorphisms of Levi degenerate hypersurfaces and a new gap phenomenon

Authors: Martin Kolar, Francine Meylan

Abstract: We give a complete classification of polynomial models for smooth real hypersurfaces of finite Catlin multitype in $\mathbb C^3$, which admit nonlinear infinitesimal CR automorphisms. As a consequence, we obtain a sharp 1-jet determination result for any smooth hypersurface with such model. The results also prove a conjecture of the first author about the origin of such nonlinear automorphisms (AI… ▽ More We give a complete classification of polynomial models for smooth real hypersurfaces of finite Catlin multitype in $\mathbb C^3$, which admit nonlinear infinitesimal CR automorphisms. As a consequence, we obtain a sharp 1-jet determination result for any smooth hypersurface with such model. The results also prove a conjecture of the first author about the origin of such nonlinear automorphisms (AIM list of problems, 2010). As another consequence, we describe all possible dimensions of the Lie algebra of infinitesimal CR automorphisms, which leads to a new "secondary" gap phenomenon. △ Less

Submitted 21 March, 2017; originally announced March 2017.

MSC Class: 32V35; 32V40

arXiv:1610.03045 [pdf, other]

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

Authors: Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro

Abstract: Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerate… ▽ More Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerated iterative Hessian sketch via the searching the conjugate direction; we then establish primal-dual connections between the Hessian sketch and dual random projection, and apply the preconditioned conjugate gradient approach on the dual problem, which leads to the accelerated iterative dual random projection methods. Finally to tackle the challenges from both large sample size and high-dimensionality, we propose the primal-dual sketch, which iteratively sketches the primal and dual formulations. We show that using a logarithmic number of calls to solvers of small scale problem, primal-dual sketch is able to recover the optimum of the original problem up to arbitrary precision. The proposed algorithms are validated via extensive experiments on synthetic and real data sets which complements our theoretical results. △ Less

Submitted 10 October, 2016; originally announced October 2016.

arXiv:1606.08091 [pdf, other]

Normal forms in Cauchy-Riemann Geometry: a survey

Authors: Martin Kolar, Ilya Kossovskiy, Dmitri Zaitsev

Abstract: One of effective ways to solve the equivalence problem and describe moduli spaces for real submanifolds in complex space is the normal form approach. In this survey, we outline some normal form constructions in CR-geometry and formulate a number of open problems. One of effective ways to solve the equivalence problem and describe moduli spaces for real submanifolds in complex space is the normal form approach. In this survey, we outline some normal form constructions in CR-geometry and formulate a number of open problems. △ Less

Submitted 26 June, 2016; originally announced June 2016.

arXiv:1508.02260 [pdf, ps, other]

Higher order symmetries of real hypersurfaces in $\Bbb C^3.$

Authors: Martin Kolar, Francine Meylan

Abstract: We study nonlinear automorphisms of Levi degenerate hypersurfaces of finite multitype. By recent results of Kolar, Meylan and Zaitsev, the Lie algebra of infinitesimal CR automorphisms may contain a graded component consisting of nonlinear vector fields of arbitrarily high degree, which has no analog in the classical Levi nondegenerate case, or in the case of finite type hypersurfaces in… ▽ More We study nonlinear automorphisms of Levi degenerate hypersurfaces of finite multitype. By recent results of Kolar, Meylan and Zaitsev, the Lie algebra of infinitesimal CR automorphisms may contain a graded component consisting of nonlinear vector fields of arbitrarily high degree, which has no analog in the classical Levi nondegenerate case, or in the case of finite type hypersurfaces in $\mathbb C^2$. We analyze this phenomenon for hypersurfaces of finite Catlin multitype in complex dimension three. The results provide a complete classification of such manifolds. As a consequence, we show on which hypersurfaces 2-jets are not sufficient to determine an automorphism. The results also confirm a conjecture about the origin of nonlinear automorphisms of Levi degenerate hypersurfaces, formulated by the first author (AIM 2010). △ Less

Submitted 10 August, 2015; originally announced August 2015.

arXiv:1503.02978 [pdf, other]

Kernel Meets Sieve: Post-Regularization Confidence Bands for Sparse Additive Model

Authors: Junwei Lu, Mladen Kolar, Han Liu

Abstract: We develop a novel procedure for constructing confidence bands for components of a sparse additive model. Our procedure is based on a new kernel-sieve hybrid estimator that combines two most popular nonparametric estimation methods in the literature, the kernel regression and the spline method, and is of interest in its own right. Existing methods for fitting sparse additive model are primarily ba… ▽ More We develop a novel procedure for constructing confidence bands for components of a sparse additive model. Our procedure is based on a new kernel-sieve hybrid estimator that combines two most popular nonparametric estimation methods in the literature, the kernel regression and the spline method, and is of interest in its own right. Existing methods for fitting sparse additive model are primarily based on sieve estimators, while the literature on confidence bands for nonparametric models are primarily based upon kernel or local polynomial estimators. Our kernel-sieve hybrid estimator combines the best of both worlds and allows us to provide a simple procedure for constructing confidence bands in high-dimensional sparse additive models. We prove that the confidence bands are asymptotically honest by studying approximation with a Gaussian process. Thorough numerical results on both synthetic data and real-world neuroscience data are provided to demonstrate the efficacy of the theory. △ Less

Submitted 12 February, 2018; v1 submitted 10 March, 2015; originally announced March 2015.

arXiv:1502.07641 [pdf, other]

ROCKET: Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models

Authors: Rina Foygel Barber, Mladen Kolar

Abstract: Undirected graphical models are used extensively in the biological and social sciences to encode a pattern of conditional independences between variables, where the absence of an edge between two nodes $a$ and $b$ indicates that the corresponding two variables $X_a$ and $X_b$ are believed to be conditionally independent, after controlling for all other measured variables. In the Gaussian case, con… ▽ More Undirected graphical models are used extensively in the biological and social sciences to encode a pattern of conditional independences between variables, where the absence of an edge between two nodes $a$ and $b$ indicates that the corresponding two variables $X_a$ and $X_b$ are believed to be conditionally independent, after controlling for all other measured variables. In the Gaussian case, conditional independence corresponds to a zero entry in the precision matrix $Ω$ (the inverse of the covariance matrix $Σ$). Real data often exhibits heavy tail dependence between variables, which cannot be captured by the commonly-used Gaussian or nonparanormal (Gaussian copula) graphical models. In this paper, we study the transelliptical model, an elliptical copula model that generalizes Gaussian and nonparanormal models to a broader family of distributions. We propose the ROCKET method, which constructs an estimator of $Ω_{ab}$ that we prove to be asymptotically normal under mild assumptions. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory. △ Less

Submitted 1 September, 2017; v1 submitted 26 February, 2015; originally announced February 2015.

arXiv:1410.7874 [pdf, ps, other]

Mean and variance estimation in high-dimensional heteroscedastic models with non-convex penalties

Authors: James Sharpnack, Mladen Kolar

Abstract: Despite its prevalence in statistical datasets, heteroscedasticity (non-constant sample variances) has been largely ignored in the high-dimensional statistics literature. Recently, studies have shown that the Lasso can accommodate heteroscedastic errors, with minor algorithmic modifications (Belloni et al., 2012; Gautier and Tsybakov, 2013). In this work, we study heteroscedastic regression with l… ▽ More Despite its prevalence in statistical datasets, heteroscedasticity (non-constant sample variances) has been largely ignored in the high-dimensional statistics literature. Recently, studies have shown that the Lasso can accommodate heteroscedastic errors, with minor algorithmic modifications (Belloni et al., 2012; Gautier and Tsybakov, 2013). In this work, we study heteroscedastic regression with linear mean model and log-linear variances model with sparse high-dimensional parameters. In this work, we propose estimating variances in a post-Lasso fashion, which is followed by weighted-least squares mean estimation. These steps employ non-convex penalties as in Fan and Li (2001), which allows us to prove oracle properties for both post-Lasso variance and mean parameter estimates. We reinforce our theoretical findings with experiments. △ Less

Submitted 29 October, 2014; v1 submitted 29 October, 2014; originally announced October 2014.

MSC Class: 62G08

arXiv:1402.6917 [pdf, ps, other]

Computational studies of conserved mean-curvature flow

Authors: Miroslav Kolar, Michal Benes, Daniel Sevcovic

Abstract: The paper presents the results of numerical solution of the evolution law for the constrained mean-curvature flow. This law originates in the theory of phase transitions for crystalline materials and describes the evolution of closed embedded curves with constant enclosed area. It is reformulated by means of the direct method into the system of degenerate parabolic partial differential equations f… ▽ More The paper presents the results of numerical solution of the evolution law for the constrained mean-curvature flow. This law originates in the theory of phase transitions for crystalline materials and describes the evolution of closed embedded curves with constant enclosed area. It is reformulated by means of the direct method into the system of degenerate parabolic partial differential equations for the curve parametrization. This system is solved numerically and several computational studies are presented as well. △ Less

Submitted 27 February, 2014; originally announced February 2014.

Comments: 5 figures, submitted to Mathematica Bohemica, Proceedings of Equadiff 2013 Conference

MSC Class: 35K57; 35K65; 65N40; 53C8

arXiv:1309.6933 [pdf, other]

Estimating Undirected Graphs Under Weak Assumptions

Authors: Larry Wasserman, Mladen Kolar, Alessandro Rinaldo

Abstract: We consider the problem of providing nonparametric confidence guarantees for undirected graphs under weak assumptions. In particular, we do not assume sparsity, incoherence or Normality. We allow the dimension $D$ to increase with the sample size $n$. First, we prove lower bounds that show that if we want accurate inferences with low assumptions then there are limitations on the dimension as a fun… ▽ More We consider the problem of providing nonparametric confidence guarantees for undirected graphs under weak assumptions. In particular, we do not assume sparsity, incoherence or Normality. We allow the dimension $D$ to increase with the sample size $n$. First, we prove lower bounds that show that if we want accurate inferences with low assumptions then there are limitations on the dimension as a function of sample size. When the dimension increases slowly with sample size, we show that methods based on Normal approximations and on the bootstrap lead to valid inferences and we provide Berry-Esseen bounds on the accuracy of the Normal approximation. When the dimension is large relative to sample size, accurate inferences for graphs under low assumptions are not possible. Instead we propose to estimate something less demanding than the entire partial correlation graph. In particular, we consider: cluster graphs, restricted partial correlation graphs and correlation graphs. △ Less

Submitted 26 September, 2013; originally announced September 2013.

MSC Class: 62H12

arXiv:1306.6557 [pdf, ps, other]

Optimal Feature Selection in High-Dimensional Discriminant Analysis

Authors: Mladen Kolar, Han Liu

Abstract: We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even thoug… ▽ More We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even though model interpretation is of fundamental importance in scientific data analysis. This paper bridges the gap by providing sharp sufficient conditions for consistent variable selection using the sparse discriminant analysis (Mai et al., 2012). Through careful analysis, we establish rates of convergence that are significantly faster than the best known results and admit an optimal scaling of the sample size n, dimensionality p, and sparsity level s in the high-dimensional setting. Sufficient conditions are complemented by the necessary information theoretic limits on the variable selection problem in the context of high-dimensional discriminant analysis. Exploiting a numerical equivalence result, our method also establish the optimal results for the ROAD estimator (Fan et al., 2012) and the sparse optimal scaling estimator (Clemmensen et al., 2011). Furthermore, we analyze an exhaustive search procedure, whose performance serves as a benchmark, and show that it is variable selection consistent under weaker conditions. Extensive simulations demonstrating the sharpness of the bounds are also provided. △ Less

Submitted 27 June, 2013; originally announced June 2013.

arXiv:0905.2529 [pdf, ps, other]

The Catlin multitype and biholomorphic equivalence of models

Authors: Martin Kolar

Abstract: We consider an alternative approach to a fundamental CR invariant - the Catlin multitype. It is applied to a general smooth hypersurface in $\mathbbC^{n+1}$, not necessarily pseudoconvex. Using this approach, we prove biholomorphic equivalence of models, and give an explicit description of biholomorphisms between different models. A constructive finite algorithm for computing the multitype is de… ▽ More We consider an alternative approach to a fundamental CR invariant - the Catlin multitype. It is applied to a general smooth hypersurface in $\mathbbC^{n+1}$, not necessarily pseudoconvex. Using this approach, we prove biholomorphic equivalence of models, and give an explicit description of biholomorphisms between different models. A constructive finite algorithm for computing the multitype is described. The results can be viewed a necessary step in understanding local biholomorphic equivalence of Levi degenerate hypersurfaces of finite Catlin multitype. △ Less

Submitted 15 May, 2009; originally announced May 2009.

MSC Class: 32V35; 32V40

arXiv:0804.2986 [pdf, ps, other]

Higher order invariants of Levi degenerate hypersurfaces

Authors: Martin Kolar

Abstract: The first part of this paper considers higher order CR invariants of three dimensional hypersurfaces of finite type. Using a full normal form we give a complete characterization of hypersurfaces with trivial local automorphism group, and analogous results for finite groups. The second part considers hypersurfaces of finite Catlin multitype and the Kohn-Nirenberg phenomenon in higher dimensions.… ▽ More The first part of this paper considers higher order CR invariants of three dimensional hypersurfaces of finite type. Using a full normal form we give a complete characterization of hypersurfaces with trivial local automorphism group, and analogous results for finite groups. The second part considers hypersurfaces of finite Catlin multitype and the Kohn-Nirenberg phenomenon in higher dimensions. We give a necessary condition for local convexifiability of a class of pseudoconvex hypersurfaces in $\mathbb C^{n+1}$. △ Less

Submitted 18 April, 2008; originally announced April 2008.

MSC Class: 32V35; 32V40; 32T25

arXiv:0709.3374 [pdf, ps, other]

Local equivalence of symmetric hypersurfaces in $\mathbb C^2$

Authors: Martin Kolar

Abstract: The Chern-Moser normal form and its analog on finite type hypersurfaces in general do not respect symmetries. Extending the work of N. K. Stanton, we consider the local equivalence problem for symmetric Levi degenerate hypersurfaces of finite type in $\mathbb C^2$. The results give for all such hypersurfaces a complete normalization which respects the symmetries. In particular, they apply to tub… ▽ More The Chern-Moser normal form and its analog on finite type hypersurfaces in general do not respect symmetries. Extending the work of N. K. Stanton, we consider the local equivalence problem for symmetric Levi degenerate hypersurfaces of finite type in $\mathbb C^2$. The results give for all such hypersurfaces a complete normalization which respects the symmetries. In particular, they apply to tubes and rigid hypersurfaces, providing an effective classification. The main tool is a complete normal form constructed for a general hypersurface with a tube model. As an application, we describe all biholomorphic maps between tubes, answering a question posed by N. Hanges. △ Less

Submitted 21 September, 2007; originally announced September 2007.

Report number: ESI preprint no.1948 MSC Class: 32V35; 32V40; 32T25

arXiv:math/0703759 [pdf, other]

The local equivalence problem in CR geometry

Authors: Martin Kolar

Abstract: This article is dedicated to the centenary of the local CR equivalence problem, formulated by Henri Poincaré in 1907. The first part gives an account of Poincaré's heuristic counting arguments, suggesting existence of infinitely many local CR invariants. Then we sketch the beautiful completion of Poincaré's approach to the problem in the work of Chern and Moser on Levi nondegenerate hypersurface… ▽ More This article is dedicated to the centenary of the local CR equivalence problem, formulated by Henri Poincaré in 1907. The first part gives an account of Poincaré's heuristic counting arguments, suggesting existence of infinitely many local CR invariants. Then we sketch the beautiful completion of Poincaré's approach to the problem in the work of Chern and Moser on Levi nondegenerate hypersurfaces. The last part is an overview of recent progress in solving the problem on Levi degenerate manifolds. △ Less

Submitted 26 March, 2007; originally announced March 2007.

MSC Class: 32V35; 32V40; 32T25

arXiv:math/0703058 [pdf, ps, other]

Generalized models and local invariants of Kohn-Nirenberg domains

Authors: Martin Kolar

Abstract: We give an explicit verifiable characterization of weakly pseudoconvex but locally nonconvexifiable hypersurfaces of finite type in dimension two. It is expressed in terms of a generalized model, which captures local geometry of the hypersurface both in the complex tangential and nontangential directions. As an application we obtain a new class of nonconvexifiable pseudoconvex hypersurfaces with… ▽ More We give an explicit verifiable characterization of weakly pseudoconvex but locally nonconvexifiable hypersurfaces of finite type in dimension two. It is expressed in terms of a generalized model, which captures local geometry of the hypersurface both in the complex tangential and nontangential directions. As an application we obtain a new class of nonconvexifiable pseudoconvex hypersurfaces with convex models. △ Less

Submitted 2 March, 2007; originally announced March 2007.

Report number: ESI preprint no.1866 MSC Class: 32T25; 32T27

arXiv:math/0609746 [pdf, ps, other]

Normal forms for hypersurfaces of finite type in $\mathbb C^2$

Authors: Martin Kolar

Abstract: We construct normal forms for Levi degenerate hypersurfaces of finite type in $\mathbb C^2$. As one consequence, an explicit solution to the problem of local biholomorphic equivalence is obtained. Another consequence determines the dimension of the stability group of the hypersurface. We construct normal forms for Levi degenerate hypersurfaces of finite type in $\mathbb C^2$. As one consequence, an explicit solution to the problem of local biholomorphic equivalence is obtained. Another consequence determines the dimension of the stability group of the hypersurface. △ Less

Submitted 27 September, 2006; originally announced September 2006.

MSC Class: 32V35; 32V40; 32T25

Journal ref: Math. Res. Lett. 12 (2005), 897-910

arXiv:math/0609348 [pdf, ps, other]

doi 10.1007/s11425-006-2049-6

Local symmetries of finite type hypersurfaces in C^2

Authors: Martin Kolar

Abstract: The paper gives a complete description of local automorphism groups for Levi degenerate hypersurfaces of finite type in $\mathbb{C}^2$. We also prove that, with the exception of hypersurfaces of the form $v = |z|^k$, local automorphisms are always determined by 1-jets. This proves a conjecture of D. Zaitsev in the finite type case. The paper gives a complete description of local automorphism groups for Levi degenerate hypersurfaces of finite type in $\mathbb{C}^2$. We also prove that, with the exception of hypersurfaces of the form $v = |z|^k$, local automorphisms are always determined by 1-jets. This proves a conjecture of D. Zaitsev in the finite type case. △ Less

Submitted 13 September, 2006; originally announced September 2006.

Comments: 11 pages

MSC Class: 32V35

Showing 1–44 of 44 results for author: Kolar, M