Search | arXiv e-print repository

Cardinality Constrained Mean-Variance Portfolios: A Penalty Decomposition Algorithm

Authors: Ahmad Mousavi, George Michailidis

Abstract: The cardinality-constrained mean-variance portfolio problem has garnered significant attention within contemporary finance due to its potential for achieving low risk while effectively managing risks and transaction costs. Instead of solving this problem directly, many existing methodologies rely on regularization and approximation techniques, which hinder investors' ability to precisely specify t… ▽ More The cardinality-constrained mean-variance portfolio problem has garnered significant attention within contemporary finance due to its potential for achieving low risk while effectively managing risks and transaction costs. Instead of solving this problem directly, many existing methodologies rely on regularization and approximation techniques, which hinder investors' ability to precisely specify the desired cardinality level of a portfolio. Moreover, these approaches typically include more hyper-parameters and increase problem dimensions. In response to these challenges, we demonstrate that a customized penalty decomposition algorithm is perfectly capable of tackling the original problem directly. This algorithm is not only convergent to a local minimizer but also is computationally cheap. It leverages a sequence of penalty subproblems where each of them is tackled via a block coordinate descent approach. In particular, the steps within the latter algorithm yield closed-form solutions and enable the identification of a saddle point of the penalty subproblem. Finally, through the application of our penalty decomposition algorithm to real-world datasets, we showcase its efficiency and its ability to outperform state-of-the-art methods in terms of CPU time. △ Less

Submitted 22 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2305.00203 [pdf, other]

Statistical Proxy based Mean-Reverting Portfolios with Sparsity and Volatility Constraints

Authors: Ahmad Mousavi, George Michailidis

Abstract: Mean-reverting portfolios with volatility and sparsity constraints are of prime interest to practitioners in finance since they are both profitable and well-diversified, while also managing risk and minimizing transaction costs. Three main measures that serve as statistical proxies to capture the mean-reversion property are predictability, portmanteau criterion, and crossing statistics. If in addi… ▽ More Mean-reverting portfolios with volatility and sparsity constraints are of prime interest to practitioners in finance since they are both profitable and well-diversified, while also managing risk and minimizing transaction costs. Three main measures that serve as statistical proxies to capture the mean-reversion property are predictability, portmanteau criterion, and crossing statistics. If in addition, reasonable volatility and sparsity for the portfolio are desired, a convex quadratic or quartic objective function, subject to nonconvex quadratic and cardinality constraints needs to be minimized. In this paper, we introduce and investigate a comprehensive modeling framework that incorporates all the previous proxies proposed in the literature and develop an effective unifying algorithm that is enabled to obtain a Karush-Kuhn-Tucker (KKT) point under mild regularity conditions. Specifically, we present a tailored penalty decomposition method that approximately solves a sequence of penalized subproblems by a block coordinate descent algorithm. To the best of our knowledge, our proposed algorithm is the first for finding volatile, sparse, and mean-reverting portfolios based on the portmanteau criterion and crossing statistics proxies. Further, we establish that the convergence analysis can be extended to a nonconvex objective function case if the starting penalty parameter is larger than a finite bound and the objective function has a bounded level set. Numerical experiments on the S&P 500 data set demonstrate the efficiency of the proposed algorithm in comparison to a semidefinite relaxation-based approach and suggest that the crossing statistics proxy yields more desirable portfolios. △ Less

Submitted 19 January, 2024; v1 submitted 29 April, 2023; originally announced May 2023.

arXiv:2211.07558 [pdf, other]

Robust Estimation of Sparse, High Dimensional Time Series with Polynomial Tails

Authors: Sagnik Halder, George Michailidis

Abstract: High dimensional Vector Autoregressions (VAR) have received a lot of interest recently due to novel applications in health, engineering, finance and the social sciences. Three issues arise when analyzing VAR's: (a) The high dimensional nature of the model in the presence of many time series that poses challenges for consistent estimation of its parameters; (b) the presence of temporal dependence i… ▽ More High dimensional Vector Autoregressions (VAR) have received a lot of interest recently due to novel applications in health, engineering, finance and the social sciences. Three issues arise when analyzing VAR's: (a) The high dimensional nature of the model in the presence of many time series that poses challenges for consistent estimation of its parameters; (b) the presence of temporal dependence introduces additional challenges for theoretical analysis of various estimation procedures; (b) the presence of heavy tails in a number of applications. Recent work, e.g. [Basu and Michailidis, 2015],[Kock and Callot,2015], has addressed consistent estimation of sparse high dimensional, stable Gaussian VAR models based on an $\ell_1$ LASSO procedure. Further, the rates obtained are optimal, in the sense that they match those for iid data, plus a multiplicative factor (which is the "price" paid) for temporal dependence. However, the third issue remains unaddressed in extant literature. This paper extends existing results in the following important direction: it considers consistent estimation of the parameters of sparse high dimensional VAR models driven by heavy tailed homoscedastic or heteroskedastic noise processes (that do not possess all moments). A robust penalized approach (e.g., LASSO) is adopted for which optimal consistency rates and corresponding finite sample bounds for the underlying model parameters are obtain that match those for iid data, albeit paying a price for temporal dependence. The theoretical results are illustrated on VAR models and also on other popular time series models. Notably, the key technical tool used, is a single concentration bound for heavy tailed dependent processes. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.04088 [pdf, other]

A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming

Authors: Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, George Michailidis

Abstract: Bilevel programming has recently received attention in the literature, due to its wide range of applications, including reinforcement learning and hyper-parameter optimization. However, it is widely assumed that the underlying bilevel optimization problem is solved either by a single machine or in the case of multiple machines connected in a star-shaped network, i.e., federated learning setting. T… ▽ More Bilevel programming has recently received attention in the literature, due to its wide range of applications, including reinforcement learning and hyper-parameter optimization. However, it is widely assumed that the underlying bilevel optimization problem is solved either by a single machine or in the case of multiple machines connected in a star-shaped network, i.e., federated learning setting. The latter approach suffers from a high communication cost on the central node (e.g., parameter server) and exhibits privacy vulnerabilities. Hence, it is of interest to develop methods that solve bilevel optimization problems in a communication-efficient decentralized manner. To that end, this paper introduces a penalty function based decentralized algorithm with theoretical guarantees for this class of optimization problems. Specifically, a distributed alternating gradient-type algorithm for solving consensus bilevel programming over a decentralized network is developed. A key feature of the proposed algorithm is to estimate the hyper-gradient of the penalty function via decentralized computation of matrix-vector products and few vector communications, which is then integrated within an alternating algorithm to obtain finite-time convergence analysis under different convexity assumptions. Our theoretical result highlights improvements in the iteration complexity of decentralized bilevel optimization, all while making efficient use of vector communication. Empirical results on both synthetic and real datasets demonstrate that the proposed method performs well in real-world settings. △ Less

Submitted 1 September, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2209.08771 [pdf, other]

Optimal Sparse Estimation of High Dimensional Heavy-tailed Time Series

Authors: Sagnik Halder, George Michailidis

Abstract: Recently, high dimensional vector auto-regressive models (VAR), have attracted a lot of interest, due to novel applications in the health, engineering and social sciences. The presence of temporal dependence poses additional challenges to the theory of penalized estimation techniques widely used in the analysis of their iid counterparts. However, recent work (e.g., [Basu and Michailidis, 2015, Koc… ▽ More Recently, high dimensional vector auto-regressive models (VAR), have attracted a lot of interest, due to novel applications in the health, engineering and social sciences. The presence of temporal dependence poses additional challenges to the theory of penalized estimation techniques widely used in the analysis of their iid counterparts. However, recent work (e.g., [Basu and Michailidis, 2015, Kock and Callot, 2015]) has established optimal consistency of $\ell_1$-LASSO regularized estimates applied to models involving high dimensional stable, Gaussian processes. The only price paid for temporal dependence is an extra multiplicative factor that equals 1 for independent and identically distributed (iid) data. Further, [Wong et al., 2020] extended these results to heavy tailed VARs that exhibit "$β$-mixing" dependence, but the rates rates are sub-optimal, while the extra factor is intractable. This paper improves these results in two important directions: (i) We establish optimal consistency rates and corresponding finite sample bounds for the underlying model parameters that match those for iid data, modulo a price for temporal dependence, that is easy to interpret and equals 1 for iid data. (ii) We incorporate more general penalties in estimation (which are not decomposable unlike the $\ell_1$ norm) to induce general sparsity patterns. The key technical tool employed is a novel, easy-to-use concentration bound for heavy tailed linear processes, that do not rely on "mixing" notions and give tighter bounds. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2112.10955 [pdf, other]

Joint Learning of Linear Time-Invariant Dynamical Systems

Authors: Aditya Modi, Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the trans… ▽ More Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the transition matrices of multiple systems. It is assumed that the transition matrices are unknown linear functions of some unknown shared basis matrices. We establish finite-time estimation error rates that fully reflect the roles of trajectory lengths, dimension, and number of systems under consideration. The presented results are fairly general and show the significant gains that can be achieved by pooling data across systems in comparison to learning each system individually. Further, they are shown to be robust against model misspecifications. To obtain the results, we develop novel techniques that are of interest for addressing similar joint-learning problems. They include tightly bounding estimation errors in terms of the eigen-structures of transition matrices, establishing sharp high probability bounds for singular values of dependent random matrices, and capturing effects of misspecified transition matrices as the systems evolve over time. △ Less

Submitted 2 January, 2024; v1 submitted 20 December, 2021; originally announced December 2021.

arXiv:2110.09596 [pdf, other]

A General Modeling Framework for Network Autoregressive Processes

Authors: Hang Yin, Abolfazl Safikhani, George Michailidis

Abstract: The paper develops a general flexible framework for Network Autoregressive Processes (NAR), wherein the response of each node linearly depends on its past values, a prespecified linear combination of neighboring nodes and a set of node-specific covariates. The corresponding coefficients are node-specific, while the framework can accommodate heavier than Gaussian errors with both spatial-autorgress… ▽ More The paper develops a general flexible framework for Network Autoregressive Processes (NAR), wherein the response of each node linearly depends on its past values, a prespecified linear combination of neighboring nodes and a set of node-specific covariates. The corresponding coefficients are node-specific, while the framework can accommodate heavier than Gaussian errors with both spatial-autorgressive and factor based covariance structures. We provide a sufficient condition that ensures the stability (stationarity) of the underlying NAR that is significantly weaker than its counterparts in previous work in the literature. Further, we develop ordinary and generalized least squares estimators for both a fixed, as well as a diverging number of network nodes, and also provide their ridge regularized counterparts that exhibit better performance in large network settings, together with their asymptotic distributions. We also address the issue of misspecifying the network connectivity and its impact on the aforementioned asymptotic distributions of the various NAR parameter estimators. The framework is illustrated on both synthetic and real air pollution data. △ Less

Submitted 18 October, 2021; originally announced October 2021.

MSC Class: 62M10; 62J07; 62H11

arXiv:2110.03200 [pdf, other]

High Dimensional Logistic Regression Under Network Dependence

Authors: Somabha Mukherjee, Ziang Niu, Sagnik Halder, Bhaswar B. Bhattacharya, George Michailidis

Abstract: Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure. This necessitates the development of models that can simul… ▽ More Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure. This necessitates the development of models that can simultaneously handle both the network peer-effect (arising from neighborhood interactions) and the effect of high-dimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We propose a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates, which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Consequently, our method is computationally efficient and, under various standard regularity conditions, our estimate attains the classical high-dimensional rate of consistency. In particular, our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical logistic regression, when the true parameter is sparse and the underlying network is not too dense. As a consequence of the general results, we derive the rates of consistency for various natural network models. We also develop an efficient algorithm for computing the estimates and validate our theoretical results in numerical experiments. △ Less

Submitted 9 September, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Major updates. 46 pages, 3 figures

arXiv:2107.09150 [pdf, ps, other]

Inference for Change Points in High Dimensional Mean Shift Models

Authors: Abhishek Kaul, George Michailidis

Abstract: We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $\log p,$ w… ▽ More We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $\log p,$ while the component-wise one is optimal. These results enable existence of limiting distributions. Component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions for any finite subset of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. The combined results are used to construct asymptotically valid component-wise and simultaneous confidence intervals for the change point parameters. The results are established under a high dimensional scaling, allowing for diminishing jump sizes, in the presence of diverging number of change points and under subexponential errors. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2106.06075 [pdf, other]

doi 10.1016/j.sigpro.2021.108245

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Authors: Babak Barazandeh, Tianjian Huang, George Michailidis

Abstract: Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in th… ▽ More Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems. △ Less

Submitted 28 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Journal ref: Signal Processing Volume 189, December 2021, 108245

arXiv:2104.12676 [pdf, other]

Solving a class of non-convex min-max games using adaptive momentum methods

Authors: Babak Barazandeh, Davoud Ataee Tarzanagh, George Michailidis

Abstract: Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks. They use an exponential moving average of past gradients of the objective function to update both search directions and learning rates. However, these methods are not suited for solving min-max optimization problems that arise in training generative adversarial networks. In this paper, we pro… ▽ More Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks. They use an exponential moving average of past gradients of the objective function to update both search directions and learning rates. However, these methods are not suited for solving min-max optimization problems that arise in training generative adversarial networks. In this paper, we propose an adaptive momentum min-max algorithm that generalizes adaptive momentum methods to the non-convex min-max regime. Further, we establish non-asymptotic rates of convergence for the proposed algorithm when used in a reasonably broad class of non-convex min-max optimization problems. Experimental results illustrate its superior performance vis-a-vis benchmark methods for solving such problems. △ Less

Submitted 26 April, 2021; originally announced April 2021.

arXiv:2005.09711 [pdf, other]

Inference on the Change Point for High Dimensional Dynamic Graphical Models

Authors: Abhishek Kaul, Hong** Zhang, Konstantinos Tsampourakis, George Michailidis

Abstract: We develop an estimator for the change point parameter for a dynamically evolving graphical model, and also obtain its asymptotic distribution under high dimensional scaling. To procure the latter result, we establish that the proposed estimator exhibits an $O_p(ψ^{-2})$ rate of convergence, wherein $ψ$ represents the jump size between the graphical model parameters before and after the change poi… ▽ More We develop an estimator for the change point parameter for a dynamically evolving graphical model, and also obtain its asymptotic distribution under high dimensional scaling. To procure the latter result, we establish that the proposed estimator exhibits an $O_p(ψ^{-2})$ rate of convergence, wherein $ψ$ represents the jump size between the graphical model parameters before and after the change point. Further, it retains sufficient adaptivity against plug-in estimates of the graphical model parameters. We characterize the forms of the asymptotic distribution under the both a vanishing and a non-vanishing regime of the magnitude of the jump size. Specifically, in the former case it corresponds to the argmax of a negative drift asymmetric two sided Brownian motion, while in the latter case to the argmax of a negative drift asymmetric two sided random walk, whose increments depend on the distribution of the graphical model. Easy to implement algorithms are provided for estimating the change point and their performance assessed on synthetic data. The proposed methodology is further illustrated on RNA-sequenced microbiome data and their changes between young and older individuals. △ Less

Submitted 21 February, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: Software available upon request (built in R)

arXiv:2005.09261 [pdf, other]

Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems

Authors: Parvin Nazari, Davoud Ataee Tarzanagh, George Michailidis

Abstract: In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machi… ▽ More In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machine learning. Nevertheless, their convergence analysis almost exclusively requires smoothness and/or convexity of the objective function. In contrast, we establish non-asymptotic rates of convergence of first and zeroth-order adaptive methods and their proximal variants for a reasonably broad class of nonsmooth \& nonconvex optimization problems. Experimental results indicate how the proposed algorithms empirically outperform stochastic gradient descent and its zeroth-order variant for solving such optimization problems. △ Less

Submitted 24 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:2003.06961 [pdf, other]

Online detection of local abrupt changes in high-dimensional Gaussian graphical models

Authors: Hossein Keshavarz, George Michailidis

Abstract: The problem of identifying change points in high-dimensional Gaussian graphical models (GGMs) in an online fashion is of interest, due to new applications in biology, economics and social sciences. The offline version of the problem, where all the data are a priori available, has led to a number of methods and associated algorithms involving regularized loss functions. However, for the online vers… ▽ More The problem of identifying change points in high-dimensional Gaussian graphical models (GGMs) in an online fashion is of interest, due to new applications in biology, economics and social sciences. The offline version of the problem, where all the data are a priori available, has led to a number of methods and associated algorithms involving regularized loss functions. However, for the online version, there is currently only a single work in the literature that develops a sequential testing procedure and also studies its asymptotic false alarm probability and power. The latter test is best suited for the detection of change points driven by global changes in the structure of the precision matrix of the GGM, in the sense that many edges are involved. Nevertheless, in many practical settings the change point is driven by local changes, in the sense that only a small number of edges exhibit changes. To that end, we develop a novel test to address this problem that is based on the $\ell_\infty$ norm of the normalized covariance matrix of an appropriately selected portion of incoming data. The study of the asymptotic distribution of the proposed test statistic under the null (no presence of a change point) and the alternative (presence of a change point) hypotheses requires new technical tools that examine maxima of graph-dependent Gaussian random variables, and that of independent interest. It is further shown that these tools lead to the imposition of mild regularity conditions for key model parameters, instead of more stringent ones required by leveraging previously used tools in related problems in the literature. Numerical work on synthetic data illustrates the good performance of the proposed detection procedure both in terms of computational and statistical efficiency across numerous experimental settings. △ Less

Submitted 15 March, 2020; originally announced March 2020.

Comments: 40 pages, 6 figures

arXiv:1904.11101 [pdf, other]

Change Point Estimation in Panel Data with Temporal and Cross-sectional Dependence

Authors: Monika Bhattacharjee, Moulinath Banerjee, George Michailidis

Abstract: We study the problem of detecting a common change point in large panel data based on a mean shift model, wherein the errors exhibit both temporal and cross-sectional dependence. A least squares based procedure is used to estimate the location of the change point. Further, we establish the convergence rate and obtain the asymptotic distribution of the least squares estimator. The form of the distri… ▽ More We study the problem of detecting a common change point in large panel data based on a mean shift model, wherein the errors exhibit both temporal and cross-sectional dependence. A least squares based procedure is used to estimate the location of the change point. Further, we establish the convergence rate and obtain the asymptotic distribution of the least squares estimator. The form of the distribution is determined by the behavior of the norm difference of the means before and after the change point. Since the behavior of this norm difference is, a priori, unknown to the practitioner, we also develop a novel data driven adaptive procedure that provides valid confidence intervals for the common change point, without requiring any such knowledge. Numerical work based on synthetic data illustrates the performance of the estimator in finite samples under different settings of temporal and cross-sectional dependence, sample size and number of panels. Finally, we examine an application to financial stock data and discuss the identified change points. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: 57 pages, 1 figure, 11 tables

arXiv:1901.09109 [pdf, other]

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Authors: Parvin Nazari, Davoud Ataee Tarzanagh, George Michailidis

Abstract: Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this… ▽ More Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (\textsc{Dadam}) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that \textsc{Dadam} works also well in practice and compares favorably to competing online optimization methods. △ Less

Submitted 28 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

arXiv:1812.03090 [pdf, other]

Change Point Estimation in a Dynamic Stochastic Block Model

Authors: Monika Bhattacharjee, Moulinath Banerjee, George Michailidis

Abstract: We consider the problem of estimating the location of a single change point in a dynamic stochastic block model. We propose two methods of estimating the change point, together with the model parameters. The first employs a least squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate… ▽ More We consider the problem of estimating the location of a single change point in a dynamic stochastic block model. We propose two methods of estimating the change point, together with the model parameters. The first employs a least squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises of the following two steps: in the first one, a least squares function is used and evaluated at each time point, but ignores the community structures and just considers a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. A comparison between these two methods is illustrated. Further, for both methods under their respective identifiability and certain additional regularity conditions, we establish rates of convergence and derive the asymptotic distributions of the change point estimators. The results are illustrated on synthetic data. △ Less

Submitted 20 May, 2020; v1 submitted 7 December, 2018; originally announced December 2018.

Comments: Please see the .pdf file for an extended abstract

arXiv:1811.04258 [pdf, other]

Input Perturbations for Adaptive Control and Learning

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds un… ▽ More This paper studies adaptive algorithms for simultaneous regulation (i.e., control) and estimation (i.e., learning) of Multiple Input Multiple Output (MIMO) linear dynamical systems. It proposes practical, easy to implement control policies based on perturbations of input signals. Such policies are shown to achieve a worst-case regret that scales as the square-root of the time horizon, and holds uniformly over time. Further, it discusses specific settings where such greedy policies attain the information theoretic lower bound of logarithmic regret. To establish the results, recent advances on self-normalized martingales together with a novel method of policy decomposition are leveraged. △ Less

Submitted 3 March, 2020; v1 submitted 10 November, 2018; originally announced November 2018.

arXiv:1807.09120 [pdf, ps, other]

Finite Time Adaptive Stabilization of LQ Systems

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful… ▽ More Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available. In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix. △ Less

Submitted 22 July, 2018; originally announced July 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1711.07230

arXiv:1806.10749 [pdf, other]

On Adaptive Linear-Quadratic Regulators

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in the literature do not provide a quantitative characterization of the effect of the unknown parameters on the regret. Further, there are problems regarding the… ▽ More Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in the literature do not provide a quantitative characterization of the effect of the unknown parameters on the regret. Further, there are problems regarding the efficient implementation of some of the existing adaptive policies. Finally, results regarding the accuracy with which the system's parameters are identified are scarce and rather incomplete. This study aims to comprehensively address these three issues. First, by introducing a novel decomposition of adaptive policies, we establish a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator. Second, we show that adaptive policies based on slight modifications of the Certainty Equivalence scheme are efficient. Specifically, we establish a regret of (nearly) square-root rate for two families of randomized adaptive policies. The presented regret bounds are obtained by using anti-concentration results on the random matrices employed for randomizing the estimates of the unknown parameters. Moreover, we study the minimal additional information on dynamics matrices that using them the regret will become of logarithmic order. Finally, the rates at which the unknown parameters of the system are being identified are presented. △ Less

Submitted 20 March, 2020; v1 submitted 27 June, 2018; originally announced June 2018.

arXiv:1806.07870 [pdf, ps, other]

Sequential change-point detection in high-dimensional Gaussian graphical models

Authors: Hossein Keshavarz, George Michailidis, Yves Atchade

Abstract: High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its… ▽ More High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its high relevance to applications in sensor networks and other engineering monitoring systems, as well as financial markets. To that end, this work introduces a novel scalable online algorithm for detecting an unknown number of abrupt changes in the inverse covariance matrix of sparse Gaussian graphical models with small delay. The proposed algorithm is based upon monitoring the conditional log-likelihood of all nodes in the network and can be extended to a large class of continuous and discrete graphical models. We also investigate asymptotic properties of our procedure under certain mild regularity conditions on the graph size, sparsity level, number of samples, and pre- and post-changes in the topology of the network. Numerical works on both synthetic and real data illustrate the good performance of the proposed methodology both in terms of computational and statistical efficiency across numerous experimental settings. △ Less

Submitted 20 June, 2018; originally announced June 2018.

Comments: 47 pages, 9 figures

arXiv:1803.03348 [pdf, other]

Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models

Authors: Subhabrata Majumdar, George Michailidis

Abstract: The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e… ▽ More The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e. across conditions or subtypes) and vertical (i.e. across different layers containing data on molecular compartments) integration of information in such datasets. We start with decomposing the multi-layer problem into a series of two-layer problems. For each two-layer problem, we model the outcomes at a node in the lower layer as dependent on those of other nodes in that layer, as well as all nodes in the upper layer. We use a combination of neighborhood selection and group-penalized regression to obtain sparse estimates of all model parameters. Following this, we develop a debiasing technique and asymptotic distributions of inter-layer directed edge weights that utilize already computed neighborhood selection coefficients for nodes in the upper layer. Subsequently, we establish global and simultaneous testing procedures for these edge weights. Performance of the proposed methodology is evaluated on synthetic and real data. △ Less

Submitted 21 January, 2022; v1 submitted 8 March, 2018; originally announced March 2018.

Comments: Journal of Machine Learning Research, 2022, https://jmlr.org/papers/v23/18-131.html

arXiv:1711.07230 [pdf, ps, other]

Optimism-Based Adaptive Regulation of Linear-Quadratic Systems

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of… ▽ More The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of the true parameter are employed in the literature. A number of asymptotic results have been established, but their finite time counterparts are few, with important restrictions. This study establishes results for the worst-case regret of optimism-based adaptive policies. The presented high probability upper bounds are optimal up to logarithmic factors. The non-asymptotic analysis of this work requires very mild assumptions; (i) stabilizability of the system's dynamics, and (ii) limiting the degree of heaviness of the noise distribution. To establish such bounds, certain novel techniques are developed to comprehensively address the probabilistic behavior of dependent random matrices with heavy-tailed distributions. △ Less

Submitted 28 March, 2019; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: 28 pages

arXiv:1710.01852 [pdf, other]

Finite Time Identification in Unstable Linear Systems

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Abstract: Identification of the parameters of stable linear dynamical systems is a well-studied problem in the literature, both in the low and high-dimensional settings. However, there are hardly any results for the unstable case, especially regarding finite time bounds. For this setting, classical results on least-squares estimation of the dynamics parameters are not applicable and therefore new concepts a… ▽ More Identification of the parameters of stable linear dynamical systems is a well-studied problem in the literature, both in the low and high-dimensional settings. However, there are hardly any results for the unstable case, especially regarding finite time bounds. For this setting, classical results on least-squares estimation of the dynamics parameters are not applicable and therefore new concepts and technical approaches need to be developed to address the issue. Unstable linear systems arise in key real applications in control theory, econometrics, and finance. This study establishes finite time bounds for the identification error of the least-squares estimates for a fairly large class of heavy-tailed noise distributions, and transition matrices of such systems. The results relate the time length (samples) required for estimation to a function of the problem dimension and key characteristics of the true underlying transition matrix and the noise distribution. To establish them, appropriate concentration inequalities for random matrices and for sequences of martingale differences are leveraged. △ Less

Submitted 5 June, 2018; v1 submitted 4 October, 2017; originally announced October 2017.

arXiv:1708.05836 [pdf, ps, other]

Common change point estimation in panel data from the least squares and maximum likelihood viewpoints

Authors: Monika Bhattacharjee, Moulinath Banerjee, George Michailidis

Abstract: We establish the convergence rates and asymptotic distributions of the common break change-point estimators, obtained by least squares and maximum likelihood in panel data models and compare their asymptotic variances. Our model assumptions accommodate a variety of commonly encountered probability distributions and, in particular, models of particular interest in econometrics beyond the commonly a… ▽ More We establish the convergence rates and asymptotic distributions of the common break change-point estimators, obtained by least squares and maximum likelihood in panel data models and compare their asymptotic variances. Our model assumptions accommodate a variety of commonly encountered probability distributions and, in particular, models of particular interest in econometrics beyond the commonly analyzed Gaussian model, including the zero-inflated Poisson model for count data, and the probit and tobit models. We also provide novel results for time dependent data in the signal-plus-noise model, with emphasis on a wide array of noise processes, including Gaussian process, MA$(\infty)$ and $m$-dependent processes. The obtained results show that maximum likelihood estimation requires a stronger signal-to-noise model identifiability condition compared to its least squares counterpart. Finally, since there are three different asymptotic regimes that depend on the behavior of the norm difference of the model parameters before and after the change point, which cannot be realistically assumed to be known, we develop a novel data driven adaptive procedure that provides valid confidence intervals for the common break, without requiring a priori knowledge of the asymptotic regime the problem falls in. △ Less

Submitted 19 August, 2017; originally announced August 2017.

arXiv:1609.09010 [pdf, other]

Estimation of Graphical Models through Structured Norm Minimization

Authors: Davoud Ataee Tarzanagh, George Michailidis

Abstract: Estimation of Markov Random Field and covariance models from high-dimensional data represents a canonical problem that has received a lot of attention in the literature. A key assumption, widely employed, is that of {\em sparsity} of the underlying model. In this paper, we study the problem of estimating such models exhibiting a more intricate structure comprising simultaneously of {\em sparse, st… ▽ More Estimation of Markov Random Field and covariance models from high-dimensional data represents a canonical problem that has received a lot of attention in the literature. A key assumption, widely employed, is that of {\em sparsity} of the underlying model. In this paper, we study the problem of estimating such models exhibiting a more intricate structure comprising simultaneously of {\em sparse, structured sparse} and {\em dense} components. Such structures naturally arise in several scientific fields, including molecular biology, finance, and political science. We introduce a general framework based on a novel structured norm that enables us to estimate such complex structures from high-dimensional data. The resulting optimization problem is convex and we introduce a linearized multi-block alternating direction method of multipliers (ADMM) algorithm to solve it efficiently. We illustrate the superior performance of the proposed framework on a number of synthetic data sets generated from both random and structured networks. Further, we apply the method to a number of real data sets and discuss the results. △ Less

Submitted 13 May, 2018; v1 submitted 28 September, 2016; originally announced September 2016.

Comments: arXiv admin note: text overlap with arXiv:1402.7349 by other authors

arXiv:1509.00268 [pdf, other]

AMON: An Open Source Architecture for Online Monitoring, Statistical Analysis and Forensics of Multi-gigabit Streams

Authors: Michael Kallitsis, Stilian Stoev, Shrijita Bhattacharya, George Michailidis

Abstract: The Internet, as a global system of interconnected networks, carries an extensive array of information resources and services. Key requirements include good quality-of-service and protection of the infrastructure from nefarious activity (e.g. distributed denial of service--DDoS--attacks). Network monitoring is essential to network engineering, capacity planning and prevention / mitigation of threa… ▽ More The Internet, as a global system of interconnected networks, carries an extensive array of information resources and services. Key requirements include good quality-of-service and protection of the infrastructure from nefarious activity (e.g. distributed denial of service--DDoS--attacks). Network monitoring is essential to network engineering, capacity planning and prevention / mitigation of threats. We develop an open source architecture, AMON (All-packet MONitor), for online monitoring and analysis of multi-gigabit network streams. It leverages the high-performance packet monitor PF RING and is readily deployable on commodity hardware. AMON examines all packets, partitions traffic into sub-streams by using rapid hashing and computes certain real-time data products. The resulting data structures provide views of the intensity and connectivity structure of network traffic at the time-scale of routing. The proposed integrated framework includes modules for the identification of heavy-hitters as well as for visualization and statistical detection at the time-of-onset of high impact events such as DDoS. This allows operators to quickly visualize and diagnose attacks, and limit offline and time consuming post-mortem analysis. We demonstrate our system in the context of real-world attack incidents, and validate it against state-of-the-art alternatives. AMON has been deployed and is currently processing 10Gbps+ live Internet traffic at Merit Network. It is extensible and allows the addition of further statistical and filtering modules for real-time forensics. △ Less

Submitted 27 January, 2016; v1 submitted 1 September, 2015; originally announced September 2015.

arXiv:1409.6673 [pdf, other]

doi 10.1109/TSG.2014.2362994

Unsplittable Load Balancing in a Network of Charging Stations Under QoS Guarantees

Authors: Islam Safak Bayram, George Michailidis, Michael Devetsikiotis

Abstract: The operation of the power grid is becoming more stressed, due to the addition of new large loads represented by Electric Vehicles (EVs) and a more intermittent supply due to the incorporation of renewable sources. As a consequence, the coordination and control of projected EV demand in a network of fast charging stations becomes a critical and challenging problem. In this paper, we introduce a… ▽ More The operation of the power grid is becoming more stressed, due to the addition of new large loads represented by Electric Vehicles (EVs) and a more intermittent supply due to the incorporation of renewable sources. As a consequence, the coordination and control of projected EV demand in a network of fast charging stations becomes a critical and challenging problem. In this paper, we introduce a game theoretic based decentralized control mechanism to alleviate negative impacts from the EV demand. The proposed mechanism takes into consideration the non-uniform spatial distribution of EVs that induces uneven power demand at each charging facility, and aims to: (i) avoid straining grid resources by offering price incentives so that customers accept being routed to less busy stations, (ii) maximize total revenue by serving more customers with the same amount of grid resources, and (iii) provide charging service to customers with a certain level of Quality-of-Service (QoS), the latter defined as the long term customer blocking probability. We examine three scenarios of increased complexity that gradually approximate real world settings. The obtained results show that the proposed framework leads to substantial performance improvements in terms of the aforementioned goals, when compared to current state of affairs. △ Less

Submitted 20 September, 2014; originally announced September 2014.

Comments: Accepted for Publication in IEEE Transactions on Smart Grid

arXiv:1406.5024 [pdf, ps, other]

doi 10.1109/JSAC.2013.130707

Electric Power Allocation in a Network of Fast Charging Stations

Authors: I. Safak Bayram, George Michailidis, Michael Devetsikiotis, Fabrizio Granelli

Abstract: In order to increase the penetration of electric vehicles, a network of fast charging stations that can provide drivers with a certain level of quality of service (QoS) is needed. However, given the strain that such a network can exert on the power grid, and the mobility of loads represented by electric vehicles, operating it efficiently is a challenging problem. In this paper, we examine a networ… ▽ More In order to increase the penetration of electric vehicles, a network of fast charging stations that can provide drivers with a certain level of quality of service (QoS) is needed. However, given the strain that such a network can exert on the power grid, and the mobility of loads represented by electric vehicles, operating it efficiently is a challenging problem. In this paper, we examine a network of charging stations equipped with an energy storage device and propose a scheme that allocates power to them from the grid, as well as routes customers. We examine three scenarios, gradually increasing their complexity. In the first one, all stations have identical charging capabilities and energy storage devices, draw constant power from the grid and no routing decisions of customers are considered. It represents the current state of affairs and serves as a baseline for evaluating the performance of the proposed scheme. In the second scenario, power to the stations is allocated in an optimal manner from the grid and in addition a certain percentage of customers can be routed to nearby stations. In the final scenario, optimal allocation of both power from the grid and customers to stations is considered. The three scenarios are evaluated using real traffic traces corresponding to weekday rush hour from a large metropolitan area in the US. The results indicate that the proposed scheme offers substantial improvements of performance compared to the current mode of operation; namely, more customers can be served with the same amount of power, thus enabling the station operators to increase their profitability. Further, the scheme provides guarantees to customers in terms of the probability of being blocked by the closest charging station. Overall, the paper addresses key issues related to the efficient operation of a network of charging stations. △ Less

Submitted 19 June, 2014; originally announced June 2014.

Comments: Published in IEEE Journal on Selected Areas in Communications July 2013

arXiv:1401.1403 [pdf, ps, other]

M-estimation in multistage sampling procedures

Authors: Atul Mallik, Moulinath Banerjee, George Michailidis

Abstract: Multi-stage (designed) procedures, obtained by splitting the sampling budget suitably across stages, and designing the sampling at a particular stage based on information about the parameter obtained from previous stages, are often advantageous from the perspective of precise inference. We develop a generic framework for M-estimation in a multistage setting and apply empirical process techniques t… ▽ More Multi-stage (designed) procedures, obtained by splitting the sampling budget suitably across stages, and designing the sampling at a particular stage based on information about the parameter obtained from previous stages, are often advantageous from the perspective of precise inference. We develop a generic framework for M-estimation in a multistage setting and apply empirical process techniques to develop limit theorems that describe the large sample behavior of the resulting M-estimates. Applications to change-point estimation, inverse isotonic regression, classification and mode estimation are provided: it is typically seen that the multistage procedure accentuates the efficiency of the M-estimates by accelerating the rate of convergence, relative to one-stage procedures. The step-by-step process induces dependence across stages and complicates the analysis in such problems, which we address through careful conditioning arguments. △ Less

Submitted 7 January, 2014; originally announced January 2014.

arXiv:1311.4175 [pdf, ps, other]

doi 10.1214/15-AOS1315

Regularized estimation in sparse high-dimensional time series models

Authors: Sumanta Basu, George Michailidis

Abstract: Many scientific and economic problems involve the analysis of high-dimensional time series datasets. However, theoretical studies in high-dimensional statistics to date rely primarily on the assumption of independent and identically distributed (i.i.d.) samples. In this work, we focus on stable Gaussian processes and investigate the theoretical properties of $\ell _1$-regularized estimates in two… ▽ More Many scientific and economic problems involve the analysis of high-dimensional time series datasets. However, theoretical studies in high-dimensional statistics to date rely primarily on the assumption of independent and identically distributed (i.i.d.) samples. In this work, we focus on stable Gaussian processes and investigate the theoretical properties of $\ell _1$-regularized estimates in two important statistical problems in the context of high-dimensional time series: (a) stochastic regression with serially correlated errors and (b) transition matrix estimation in vector autoregressive (VAR) models. We derive nonasymptotic upper bounds on the estimation errors of the regularized estimates and establish that consistent estimation under high-dimensional scaling is possible via $\ell_1$-regularization for a large class of stable processes under sparsity constraints. A key technical contribution of the work is to introduce a measure of stability for stationary processes using their spectral properties that provides insight into the effect of dependence on the accuracy of the regularized estimates. With this proposed stability measure, we establish some useful deviation bounds for dependent data, which can be used to study several important regularized estimates in a time series setting. △ Less

Submitted 30 July, 2015; v1 submitted 17 November, 2013; originally announced November 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1315 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1315

Journal ref: Annals of Statistics 2015, Vol. 43, No. 4, 1535-1567

arXiv:1304.2329 [pdf, other]

Socially optimal charging strategies for electric vehicles

Authors: Elena Yudovina, George Michailidis

Abstract: Electric vehicles represent a promising technology for reducing emissions and dependence on fossil fuels and have started entering different automotive markets. In order to bolster their adoption by consumers and hence enhance their penetration rate, a charging station infrastructure needs to be deployed. This paper studies decentralized policies that assign electric vehicles to a network of charg… ▽ More Electric vehicles represent a promising technology for reducing emissions and dependence on fossil fuels and have started entering different automotive markets. In order to bolster their adoption by consumers and hence enhance their penetration rate, a charging station infrastructure needs to be deployed. This paper studies decentralized policies that assign electric vehicles to a network of charging stations with the goal to achieve little to no queueing. This objective is especially important for electric vehicles, whose charging times are fairly long. The social optimality of the proposed policies is established in the many-server regime, where each station is equipped with multiple charging slots. Further, convergence issues of the algorithm that achieves the optimal policy are examined. Finally, the results provide insight on how to address questions related to the optimal location deployment of the infrastructure. △ Less

Submitted 8 April, 2013; originally announced April 2013.

Comments: 24 pages, 4 figures. Submitted

arXiv:1203.0543

Efficient Approximation Algorithms for Optimal Large-scale Network Monitoring

Authors: Michalis Kallitsis, Stilian Stoev, George Michailidis

Abstract: The growing amount of applications that generate vast amount of data in short time scales render the problem of partial monitoring, coupled with prediction, a rather fundamental one. We study the aforementioned canonical problem under the context of large-scale monitoring of communication networks. We consider the problem of selecting the "best" subset of links so as to optimally predict the quant… ▽ More The growing amount of applications that generate vast amount of data in short time scales render the problem of partial monitoring, coupled with prediction, a rather fundamental one. We study the aforementioned canonical problem under the context of large-scale monitoring of communication networks. We consider the problem of selecting the "best" subset of links so as to optimally predict the quantity of interest at the remaining ones. This is a well know NP-hard problem, and algorithms seeking the exact solution are prohibitively expensive. We present a number of approximation algorithms that: 1) their computational complexity gains a significant improvement over existing greedy algorithms; 2) exploit the geometry of principal component analysis, which also helps us establish theoretical bounds on the prediction error; 3) are amenable for randomized implementation and execution in parallel or distributed fashion, a process that often yields the exact solution. The new algorithms are demonstrated and evaluated using real-world network data. △ Less

Submitted 3 December, 2013; v1 submitted 2 March, 2012; originally announced March 2012.

Comments: Paper withdrawn since the official journal paper is now available. arXiv admin note: substantial text overlap with arXiv:1108.3048

arXiv:1106.1916 [pdf, ps, other]

Threshold estimation based on a p-value framework in dose-response and regression settings

Authors: Atul Mallik, Bodhisattva Sen, Moulinath Banerjee, George Michailidis

Abstract: We use p-values to identify the threshold level at which a regression function takes off from its baseline value, a problem motivated by applications in toxicological and pharmacological dose-response studies and environmental statistics. We study the problem in two sampling settings: one where multiple responses can be obtained at a number of different covariate-levels and the other the standard… ▽ More We use p-values to identify the threshold level at which a regression function takes off from its baseline value, a problem motivated by applications in toxicological and pharmacological dose-response studies and environmental statistics. We study the problem in two sampling settings: one where multiple responses can be obtained at a number of different covariate-levels and the other the standard regression setting involving limited number of response values at each covariate. Our procedure involves testing the hypothesis that the regression function is at its baseline at each covariate value and then computing the potentially approximate p-value of the test. An estimate of the threshold is obtained by fitting a piecewise constant function with a single jump discontinuity, otherwise known as a stump, to these observed p-values, as they behave in markedly different ways on the two sides of the threshold. The estimate is shown to be consistent and its finite sample properties are studied through simulations. Our approach is computationally simple and extends to the estimation of the baseline value of the regression function, heteroscedastic errors and to time-series. It is illustrated on some real data applications. △ Less

Submitted 9 June, 2011; originally announced June 2011.

arXiv:1105.3018 [pdf, ps, other]

doi 10.1214/10-AOS820

A two-stage hybrid procedure for estimating an inverse regression function

Authors: Runlong Tang, Moulinath Banerjee, George Michailidis

Abstract: We consider a two-stage procedure (TSP) for estimating an inverse regression function at a given point, where isotonic regression is used at stage one to obtain an initial estimate and a local linear approximation in the vicinity of this estimate is used at stage two. We establish that the convergence rate of the second-stage estimate can attain the parametric $n^{1/2}$ rate. Furthermore, a bootst… ▽ More We consider a two-stage procedure (TSP) for estimating an inverse regression function at a given point, where isotonic regression is used at stage one to obtain an initial estimate and a local linear approximation in the vicinity of this estimate is used at stage two. We establish that the convergence rate of the second-stage estimate can attain the parametric $n^{1/2}$ rate. Furthermore, a bootstrapped variant of TSP (BTSP) is introduced and its consistency properties studied. This variant manages to overcome the slow speed of the convergence in distribution and the estimation of the derivative of the regression function at the unknown target quantity. Finally, the finite sample performance of BTSP is studied through simulations and the method is illustrated on a data set. △ Less

Submitted 16 May, 2011; originally announced May 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOS820 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS820

Journal ref: Annals of Statistics 2011, Vol. 39, No. 2, 956-989

arXiv:1105.0417 [pdf, ps, other]

Cone Schedules for Processing Systems in Fluctuating Environments

Authors: Kevin Ross, Nicholas Bambos, George Michailidis

Abstract: We consider a generalized processing system having several queues, where the available service rate combinations are fluctuating over time due to reliability and availability variations. The objective is to allocate the available resources, and corresponding service rates, in response to both workload and service capacity considerations, in order to maintain the long term stability of the system.… ▽ More We consider a generalized processing system having several queues, where the available service rate combinations are fluctuating over time due to reliability and availability variations. The objective is to allocate the available resources, and corresponding service rates, in response to both workload and service capacity considerations, in order to maintain the long term stability of the system. The service configurations are completely arbitrary, including negative service rates which represent forwarding and service-induced cross traffic. We employ a trace-based trajectory asymptotic technique, which requires minimal assumptions about the arrival dynamics of the system. We prove that cone schedules, which leverage the geometry of the queueing dynamics, maximize the system throughput for a broad class of processing systems, even under adversarial arrival processes. We study the impact of fluctuating service availability, where resources are available only some of the time, and the schedule must dynamically respond to the changing available service rates, establishing both the capacity of such systems and the class of schedules which will stabilize the system at full capacity. The rich geometry of the system dynamics leads to important insights for stability, performance and scalability, and substantially generalizes previous findings. The processing system studied here models a broad variety of computer, communication and service networks, including varying channel conditions and cross-traffic in wireless networking, and call centers with fluctuating capacity. The findings have implications for bandwidth and processor allocation in communication networks and workforce scheduling in congested call centers. △ Less

Submitted 2 May, 2011; originally announced May 2011.

Comments: 25 pages, 5 figures

arXiv:1005.4641 [pdf, other]

Network-wide Statistical Modeling and Prediction of Computer Traffic

Authors: Joel Vaughan, Stilian A. Stoev, George Michailidis

Abstract: In order to maintain consistent quality of service, computer network engineers face the task of monitoring the traffic fluctuations on the individual links making up the network. However, due to resource constraints and limited access, it is not possible to directly measure all the links. Starting with a physically interpretable probabilistic model of network-wide traffic, we demonstrate how an ex… ▽ More In order to maintain consistent quality of service, computer network engineers face the task of monitoring the traffic fluctuations on the individual links making up the network. However, due to resource constraints and limited access, it is not possible to directly measure all the links. Starting with a physically interpretable probabilistic model of network-wide traffic, we demonstrate how an expensively obtained set of measurements may be used to develop a network-specific model of the traffic across the network. This model may then be used in conjunction with easily obtainable measurements to provide more accurate prediction than is possible with only the inexpensive measurements. We show that the model, once learned may be used for the same network for many different periods of traffic. Finally, we show an application of the prediction technique to create relevant control charts for detection and isolation of shifts in network traffic. △ Less

Submitted 25 May, 2010; originally announced May 2010.

Report number: Department of Statistics, the University of Michigan, Technical Report 501

arXiv:1005.4358 [pdf, ps, other]

On the estimation of the extremal index based on scaling and resampling

Authors: Kamal Hamidieh, Stilian A. Stoev, George Michailidis

Abstract: The extremal index parameter theta characterizes the degree of local dependence in the extremes of a stationary time series and has important applications in a number of areas, such as hydrology, telecommunications, finance and environmental studies. In this study, a novel estimator for theta based on the asymptotic scaling of block-maxima and resampling is introduced. It is shown to be consistent… ▽ More The extremal index parameter theta characterizes the degree of local dependence in the extremes of a stationary time series and has important applications in a number of areas, such as hydrology, telecommunications, finance and environmental studies. In this study, a novel estimator for theta based on the asymptotic scaling of block-maxima and resampling is introduced. It is shown to be consistent and asymptotically normal for a large class of m-dependent time series. Further, a procedure for the automatic selection of its tuning parameter is developed and different types of confidence intervals that prove useful in practice proposed. The performance of the estimator is examined through simulations, which show its highly competitive behavior. Finally, the estimator is applied to three real data sets of daily crude oil prices, daily returns of the S&P 500 stock index, and high-frequency, intra-day traded volumes of a stock. These applications demonstrate additional diagnostic features of statistical plots based on the new estimator. △ Less

Submitted 24 May, 2010; originally announced May 2010.

Report number: Department of Statistics, the University of Michigan, Technical Report 462

Journal ref: Journal of Computational and Graphical Statistics, 18(3) (2009) pp 731-755

arXiv:1005.4337 [pdf, other]

Global Modeling and Prediction of Computer Network Traffic

Authors: Stilian A. Stoev, George Michailidis, Joel Vaughan

Abstract: We develop a probabilistic framework for global modeling of the traffic over a computer network. This model integrates existing single-link (-flow) traffic models with the routing over the network to capture the global traffic behavior. It arises from a limit approximation of the traffic fluctuations as the time--scale and the number of users sharing the network grow. The resulting probability mod… ▽ More We develop a probabilistic framework for global modeling of the traffic over a computer network. This model integrates existing single-link (-flow) traffic models with the routing over the network to capture the global traffic behavior. It arises from a limit approximation of the traffic fluctuations as the time--scale and the number of users sharing the network grow. The resulting probability model is comprised of a Gaussian and/or a stable, infinite variance components. They can be succinctly described and handled by certain 'space-time' random fields. The model is validated against simulated and real data. It is then applied to predict traffic fluctuations over unobserved links from a limited set of observed links. Further, applications to anomaly detection and network management are briefly discussed. △ Less

Submitted 24 May, 2010; originally announced May 2010.

Report number: Department of Statistics, the University of Michigan, Technical Report 490

arXiv:1005.4329 [pdf, ps, other]

doi 10.1002/asmb.764

On the Estimation of the Heavy-Tail Exponent in Time Series using the Max-Spectrum

Authors: Stilian A Stoev, George Michailidis

Abstract: This paper addresses the problem of estimating the tail index of distributions with heavy, Pareto-type tails for dependent data, that is of interest in the areas of finance, insurance, environmental monitoring and teletraffic analysis. A novel approach based on the max self-similarity scaling behavior of block maxima is introduced. The method exploits the increasing lack of dependence of maxima ov… ▽ More This paper addresses the problem of estimating the tail index of distributions with heavy, Pareto-type tails for dependent data, that is of interest in the areas of finance, insurance, environmental monitoring and teletraffic analysis. A novel approach based on the max self-similarity scaling behavior of block maxima is introduced. The method exploits the increasing lack of dependence of maxima over large size blocks, which proves useful for time series data. We establish the consistency and asymptotic normality of the proposed max-spectrum estimator for a large class of m-dependent time series, in the regime of intermediate block-maxima. In the regime of large block-maxima, we demonstrate the distributional consistency of the estimator for a broad range of time series models including linear processes. The max-spectrum estimator is a robust and computationally efficient tool, which provides a novel time-scale perspective to the estimation of the tail--exponents. Its performance is illustrated over synthetic and real data sets. △ Less

Submitted 24 May, 2010; originally announced May 2010.

Report number: Department of Statistics, the University of Michigan, Technical Report 447

Journal ref: Applied Stochastic Models in Business and Industry, 2009

arXiv:0911.5439 [pdf, ps, other]

Penalized Likelihood Methods for Estimation of Sparse High Dimensional Directed Acyclic Graphs

Authors: Ali Shojaie, George Michailidis

Abstract: Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. The general problem of estimating DAGs from observed data is computationally… ▽ More Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. The general problem of estimating DAGs from observed data is computationally NP-hard, Moreover two directed graphs may be observationally equivalent. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose a penalized likelihood approach that directly estimates the adjacency matrix of DAGs. Both lasso and adaptive lasso penalties are considered and an efficient algorithm is proposed for estimation of high dimensional DAGs. We study variable selection consistency of the two penalties when the number of variables grows to infinity with the sample size. We show that although lasso can only consistently estimate the true network under stringent assumptions, adaptive lasso achieves this task under mild regularity conditions. The performance of the proposed methods is compared to alternative methods in simulated, as well as real, data examples. △ Less

Submitted 28 November, 2009; originally announced November 2009.

Comments: 19 pages, 8 figures

arXiv:0908.1838 [pdf, ps, other]

doi 10.1214/08-AOS602

Change-point estimation under adaptive sampling

Authors: Yan Lan, Moulinath Banerjee, George Michailidis

Abstract: We consider the problem of locating a jump discontinuity (change-point) in a smooth parametric regression model with a bounded covariate. It is assumed that one can sample the covariate at different values and measure the corresponding responses. Budget constraints dictate that a total of $n$ such measurements can be obtained. A multistage adaptive procedure is proposed, where at each stage an e… ▽ More We consider the problem of locating a jump discontinuity (change-point) in a smooth parametric regression model with a bounded covariate. It is assumed that one can sample the covariate at different values and measure the corresponding responses. Budget constraints dictate that a total of $n$ such measurements can be obtained. A multistage adaptive procedure is proposed, where at each stage an estimate of the change point is obtained and new points are sampled from its appropriately chosen neighborhood. It is shown that such procedures accelerate the rate of convergence of the least squares estimate of the change-point. Further, the asymptotic distribution of the estimate is derived using empirical processes techniques. The latter result provides guidelines on how to choose the tuning parameters of the multistage procedure in practice. The improved efficiency of the procedure is demonstrated using real and synthetic data. This problem is primarily motivated by applications in engineering systems. △ Less

Submitted 13 August, 2009; originally announced August 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOS602 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS602 MSC Class: 62F12; 62K99 (Primary)

Journal ref: Annals of Statistics 2009, Vol. 37, No. 4, 1752-1791

arXiv:math/0609163 [pdf, ps, other]

Estimating heavy-tail exponents through max self-similarity

Authors: Stilian A. Stoev, George Michailidis, Murad S. Taqqu

Abstract: In this paper, a novel approach to the problem of estimating the heavy-tail exponent alpha>0 of a distribution is proposed. It is based on the fact that block-maxima of size m of the independent and identically distributed data scale at a rate of m^{1/alpha}. This scaling rate can be captured well by the max-spectrum plot of the data that leads to regression based estimators. Consistency and asy… ▽ More In this paper, a novel approach to the problem of estimating the heavy-tail exponent alpha>0 of a distribution is proposed. It is based on the fact that block-maxima of size m of the independent and identically distributed data scale at a rate of m^{1/alpha}. This scaling rate can be captured well by the max-spectrum plot of the data that leads to regression based estimators. Consistency and asymptotic normality of these estimators is established under mild conditions on the behavior of the tail of the distribution. The results are obtained by establishing bounds on the rate of convergence of moment-type functionals of heavy-tailed maxima. Such bounds often yield exact rates of convergence and are of independent interest. Practical issues on the automatic selection of tuning parameters for the estimators and corresponding confidence intervals are also addressed. Extensive numerical simulations show that the proposed method proves competitive for both small and large sample sizes and for a large range of tail exponents. The method is shown to be more robust than the classical Hill plot and is illustrated on two data sets of insurance claims and natural gas field sizes. △ Less

Submitted 6 September, 2006; originally announced September 2006.

MSC Class: 62G32; 62G20; 62G05

Showing 1–43 of 43 results for author: Michailidis, G