-
Differentially private multivariate medians
Authors:
Kelly Ramsay,
Aukosh Jagannath,
Shoja'eddin Chenouri
Abstract:
Statistical tools which satisfy rigorous privacy guarantees are necessary for modern data analysis. It is well-known that robustness against contamination is linked to differential privacy. Despite this fact, using multivariate medians for differentially private and robust multivariate location estimation has not been systematically studied. We develop novel finite-sample performance guarantees fo…
▽ More
Statistical tools which satisfy rigorous privacy guarantees are necessary for modern data analysis. It is well-known that robustness against contamination is linked to differential privacy. Despite this fact, using multivariate medians for differentially private and robust multivariate location estimation has not been systematically studied. We develop novel finite-sample performance guarantees for differentially private multivariate depth-based medians, which are essentially sharp. Our results cover commonly used depth functions, such as the halfspace (or Tukey) depth, spatial depth, and the integrated dual depth. We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy. We demonstrate our results numerically using a Gaussian contamination model in dimensions up to d = 100, and compare them to a state-of-the-art private mean estimation algorithm. As a by-product of our investigation, we prove concentration inequalities for the output of the exponential mechanism about the maximizer of the population objective function. This bound applies to objective functions that satisfy a mild regularity condition.
△ Less
Submitted 26 March, 2024; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Robust changepoint detection in the variability of multivariate functional data
Authors:
Kelly Ramsay,
Shoja'eddin Chenouri
Abstract:
We consider the problem of robustly detecting changepoints in the variability of a sequence of independent multivariate functions. We develop a novel changepoint procedure, called the functional Kruskal--Wallis for covariance (FKWC) changepoint procedure, based on rank statistics and multivariate functional data depth. The FKWC changepoint procedure allows the user to test for at most one changepo…
▽ More
We consider the problem of robustly detecting changepoints in the variability of a sequence of independent multivariate functions. We develop a novel changepoint procedure, called the functional Kruskal--Wallis for covariance (FKWC) changepoint procedure, based on rank statistics and multivariate functional data depth. The FKWC changepoint procedure allows the user to test for at most one changepoint (AMOC) or an epidemic period, or to estimate the number and locations of an unknown amount of changepoints in the data. We show that when the ``signal-to-noise'' ratio is bounded below, the changepoint estimates produced by the FKWC procedure attain the minimax localization rate for detecting general changes in distribution in the univariate setting (Theorem 1). We also provide the behavior of the proposed test statistics for the AMOC and epidemic setting under the null hypothesis (Theorem 2) and, as a simple consequence of our main result, these tests are consistent (Corollary 1). In simulation, we show that our method is particularly robust when compared to similar changepoint methods. We present an application of the FKWC procedure to intraday asset returns and f-MRI scans. As a by-product of Theorem 1, we provide a concentration result for integrated functional depth functions (Lemma 2), which may be of general interest.
△ Less
Submitted 18 May, 2024; v1 submitted 2 December, 2021;
originally announced December 2021.
-
On Monitoring High-Dimensional Processes with Individual Observations
Authors:
Mohsen Ebadi,
Shojaeddin Chenouri,
Stefan H. Steiner
Abstract:
Modern data collecting methods and computation tools have made it possible to monitor high-dimensional processes. In this article, Phase II monitoring of high-dimensional processes is investigated when the available number of samples collected in Phase I is limitted in comparison to the number of variables. A new charting statistic for high-dimensional multivariate processes based on the diagonal…
▽ More
Modern data collecting methods and computation tools have made it possible to monitor high-dimensional processes. In this article, Phase II monitoring of high-dimensional processes is investigated when the available number of samples collected in Phase I is limitted in comparison to the number of variables. A new charting statistic for high-dimensional multivariate processes based on the diagonal elements of the underlying covariance matrix is introduced and a unified procedure for Phase I and II by employing a self-starting control chart is proposed. To remedy the effect of outliers, we adopt a robust procedure for parameter estimation in Phase I and introduce the appropriate consistent estimators. The statistical performance of the proposed method is evaluated in Phase II through average run length (ARL) criterion in the absence and presence of outliers and reveals that the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Finally, we illustrate the applicability of our proposed method via a real-world example.
△ Less
Submitted 23 January, 2023; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Phase I Analysis of High-Dimensional Processes in the Presence of Outliers
Authors:
Mohsen Ebadi,
Shojaeddin Chenouri,
Stefan H. Steiner
Abstract:
One of the significant challenges in monitoring the quality of products today is the high dimensionality of quality characteristics. In this paper, we address Phase I analysis of high-dimensional processes with individual observations when the available number of samples collected over time is limited. Using a new charting statistic, we propose a robust procedure for parameter estimation in Phase…
▽ More
One of the significant challenges in monitoring the quality of products today is the high dimensionality of quality characteristics. In this paper, we address Phase I analysis of high-dimensional processes with individual observations when the available number of samples collected over time is limited. Using a new charting statistic, we propose a robust procedure for parameter estimation in Phase I. This robust procedure is efficient in parameter estimation in the presence of outliers or contamination in the data. A consistent estimator is proposed for parameter estimation and a finite sample correction coefficient is derived and evaluated through simulation. We assess the statistical performance of the proposed method in Phase I in terms of the probability of signal criterion. This assessment is carried out in the absence and presence of outliers. We show that, in both phases, the proposed control chart scheme effectively detects various kinds of shifts in the process mean. Besides, we present two real-world examples to illustrate the applicability of our proposed method.
△ Less
Submitted 29 December, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Functional Boxplots for Outlier Detection in Additive Manufacturing
Authors:
Ahmad Mozaffari,
Shojaeddin Chenouri,
Ehsan Toyserkani,
Usman Ali
Abstract:
Additive manufacturing (AM), also known as 3D printing, is one of the most promising digital manufacturing technologies, thanks to its potential to produce highly complex geometries rapidly. AM has been promoted from a prototy** methodology to a serial production platform for which precise process monitoring and control strategies to guarantee the accuracy of products are required. This need has…
▽ More
Additive manufacturing (AM), also known as 3D printing, is one of the most promising digital manufacturing technologies, thanks to its potential to produce highly complex geometries rapidly. AM has been promoted from a prototy** methodology to a serial production platform for which precise process monitoring and control strategies to guarantee the accuracy of products are required. This need has motivated practitioners to focus on designing process monitoring tools to improve the accuracy of produced geometries. In line with the emerging interest, in the current investigation, a novel strategy is proposed which uses functional representation of in-plane contours to come up with statistical boxplots with the goal of detecting outlying AM products. The method can be used for process monitoring during AM production to automatically detect defective products in an online fashion. To ensure the considered method has an acceptable potential, different complex 3D geometries are considered and undergo different types of stochastic perturbations to collect data for outlier detection. The results of the conducted simulation are very promising and reveal the reliability of the proposed method for detecting products with statistically significant deformation.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Robust nonparametric hypothesis tests for differences in the covariance structure of functional data
Authors:
Kelly Ramsay,
Shojaeddin Chenouri
Abstract:
We develop a group of robust, nonparametric hypothesis tests which detect differences between the covariance operators of several populations of functional data. These tests, called FKWC tests, are based on functional data depth ranks. These tests work well even when the data is heavy tailed, which is shown both in simulation and theoretically. These tests offer several other benefits, they have a…
▽ More
We develop a group of robust, nonparametric hypothesis tests which detect differences between the covariance operators of several populations of functional data. These tests, called FKWC tests, are based on functional data depth ranks. These tests work well even when the data is heavy tailed, which is shown both in simulation and theoretically. These tests offer several other benefits, they have a simple distribution under the null hypothesis, they are computationally cheap and they possess transformation invariance properties. We show that under general alternative hypotheses these tests are consistent under mild, nonparametric assumptions. As a result of this work, we introduce a new functional depth function called L2-root depth which works well for the purposes of detecting differences in magnitude between covariance kernels. We present an analysis of the FKWC test using L2-root depth under local alternatives. In simulation, when the true covariance kernels have strictly positive eigenvalues, we show that these tests have higher power than their competitors, while still maintaining their nominal size. We also provide a methods for computing sample size and performing multiple comparisons.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Valid Post-Detection Inference for Change Points Identified Using Trend Filtering
Authors:
Reza Valiollahi Mehrizi,
Shojaeddin Chenouri
Abstract:
There are many research works and methods about change point detection in the literature. However, there are only a few that provide inference for such change points after being estimated. This work mainly focuses on a statistical analysis of change points estimated by the PRUTF algorithm, which incorporates trend filtering to determine change points in piecewise polynomial signals. This paper dev…
▽ More
There are many research works and methods about change point detection in the literature. However, there are only a few that provide inference for such change points after being estimated. This work mainly focuses on a statistical analysis of change points estimated by the PRUTF algorithm, which incorporates trend filtering to determine change points in piecewise polynomial signals. This paper develops a methodology to perform statistical inference, such as computing p-values and constructing confidence intervals in the newly developed post-selection inference framework. Our work concerns both cases of known and unknown error variance. As pointed out in the post-selection inference literature, the length of such confidence intervals are undesirably long. To resolve this shortcoming, we also provide two novel strategies, global post-detection, and local post-detection which are based on the intrinsic properties of change points. We run our proposed methods on real as well as simulated data to evaluate their performances.
△ Less
Submitted 30 July, 2021; v1 submitted 24 April, 2021;
originally announced April 2021.
-
Differentially private depth functions and their associated medians
Authors:
Kelly Ramsay,
Shoja'eddin Chenouri
Abstract:
In this paper, we investigate the differentially private estimation of data depth functions and their associated medians. We introduce several methods for privatizing depth values at a fixed point, and show that for some depth functions, when the depth is computed at an out of sample point, privacy can be gained for free when $n\rightarrow \infty$. We also present a method for privately estimating…
▽ More
In this paper, we investigate the differentially private estimation of data depth functions and their associated medians. We introduce several methods for privatizing depth values at a fixed point, and show that for some depth functions, when the depth is computed at an out of sample point, privacy can be gained for free when $n\rightarrow \infty$. We also present a method for privately estimating the vector of sample point depth values. Additionally, we introduce estimation methods for depth-based medians for both depth functions with low global sensitivity and depth functions with only highly probable, low local sensitivity. We provide a general result (Lemma 1) which can be used to prove consistency of an estimator produced by the exponential mechanism, provided the limiting cost function is sufficiently smooth at a unique minimizer. We also introduce a general algorithm to privately estimate a minimizer of a cost function which has, with high probability, low local sensitivity. This algorithm combines the propose-test-release algorithm with the exponential mechanism. An application of this algorithm to generate consistent estimates of the projection depth-based median is presented. Thus, for these private depth-based medians, we show that it is possible for privacy to be obtained for free when $n\rightarrow \infty$.
△ Less
Submitted 7 April, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
A non-alternating graph hashing algorithm for large scale image search
Authors:
Sobhan Hemati,
Mohammad Hadi Mehdizavareh,
Shojaeddin Chenouri,
Hamid R Tizhoosh
Abstract:
In the era of big data, methods for improving memory and computational efficiency have become crucial for successful deployment of technologies. Hashing is one of the most effective approaches to deal with computational limitations that come with big data. One natural way for formulating this problem is spectral hashing that directly incorporates affinity to learn binary codes. However, due to bin…
▽ More
In the era of big data, methods for improving memory and computational efficiency have become crucial for successful deployment of technologies. Hashing is one of the most effective approaches to deal with computational limitations that come with big data. One natural way for formulating this problem is spectral hashing that directly incorporates affinity to learn binary codes. However, due to binary constraints, the optimization becomes intractable. To mitigate this challenge, different relaxation approaches have been proposed to reduce the computational load of obtaining binary codes and still attain a good solution. The problem with all existing relaxation methods is resorting to one or more additional auxiliary variables to attain high quality binary codes while relaxing the problem. The existence of auxiliary variables leads to coordinate descent approach which increases the computational complexity. We argue that introducing these variables is unnecessary. To this end, we propose a novel relaxed formulation for spectral hashing that adds no additional variables to the problem. Furthermore, instead of solving the problem in original space where number of variables is equal to the data points, we solve the problem in a much smaller space and retrieve the binary codes from this solution. This trick reduces both the memory and computational complexity at the same time. We apply two optimization techniques, namely projected gradient and optimization on manifold, to obtain the solution. Using comprehensive experiments on four public datasets, we show that the proposed efficient spectral hashing (ESH) algorithm achieves highly competitive retrieval performance compared with state of the art at low complexity.
△ Less
Submitted 19 June, 2021; v1 submitted 24 December, 2020;
originally announced December 2020.
-
Robust multiple change-point detection for multivariate variability using data depth
Authors:
Kelly Ramsay,
Shoja'eddin Chenouri
Abstract:
In this paper, we introduce two robust, nonparametric methods for multiple change-point detection in the variability of a multivariate sequence of observations. We demonstrate that changes in ranks generated from data depth functions can be used to detect changes in the variability of a sequence of multivariate observations. In order to detect more than one change, the first algorithm uses methods…
▽ More
In this paper, we introduce two robust, nonparametric methods for multiple change-point detection in the variability of a multivariate sequence of observations. We demonstrate that changes in ranks generated from data depth functions can be used to detect changes in the variability of a sequence of multivariate observations. In order to detect more than one change, the first algorithm uses methods similar to that of wild-binary segmentation. The second algorithm estimates change-points by maximizing a penalized version of the classical Kruskal Wallis ANOVA test statistic. We show that this objective function can be maximized via the well-known PELT algorithm. Under mild, nonparametric assumptions both of these algorithms are shown to be consistent for the correct number of change-points and the correct location(s) of the change-point(s). We demonstrate the efficacy of these methods with a simulation study, where we compare our new methods to several competing methods. We show our methods outperform existing methods in this problem setting, and our methods can estimate changes accurately when the data are heavy tailed or skewed.
△ Less
Submitted 28 November, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Detection of Change Points in Piecewise Polynomial Signals Using Trend Filtering
Authors:
Reza V. Mehrizi,
Shojaeddin Chenouri
Abstract:
While many approaches have been proposed for discovering abrupt changes in piecewise constant signals, few methods are available to capture these changes in piecewise polynomial signals. In this paper, we propose a change point detection method, PRUTF, based on trend filtering. By providing a comprehensive dual solution path for trend filtering, PRUTF allows us to discover change points of the und…
▽ More
While many approaches have been proposed for discovering abrupt changes in piecewise constant signals, few methods are available to capture these changes in piecewise polynomial signals. In this paper, we propose a change point detection method, PRUTF, based on trend filtering. By providing a comprehensive dual solution path for trend filtering, PRUTF allows us to discover change points of the underlying signal for either a given value of the regularization parameter or a specific number of steps of the algorithm. We demonstrate that the dual solution path constitutes a Gaussian bridge process that enables us to derive an exact and efficient stop** rule for terminating the search algorithm. We also prove that the estimates produced by this algorithm are asymptotically consistent in pattern recovery. This result holds even in the case of staircases (consecutive change points of the same sign) in the signal. Finally, we investigate the performance of our proposed method for various signals and then compare its performance against some state-of-the-art methods in the context of change point detection. We apply our method to three real-world datasets including the UK House Price Index (HPI), the GISS surface Temperature Analysis (GISTEMP) and the Coronavirus disease (COVID-19) pandemic.
△ Less
Submitted 30 July, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Statistical Monitoring of the Covariance Matrix in Multivariate Processes: A Literature Review
Authors:
Mohsen Ebadi,
Shoja'eddin Chenouri,
Dennis K. J. Lin,
Stefan H. Steiner
Abstract:
Monitoring several correlated quality characteristics of a process is common in modern manufacturing and service industries. Although a lot of attention has been paid to monitoring the multivariate process mean, not many control charts are available for monitoring the covariance matrix. This paper presents a comprehensive overview of the literature on control charts for monitoring the covariance m…
▽ More
Monitoring several correlated quality characteristics of a process is common in modern manufacturing and service industries. Although a lot of attention has been paid to monitoring the multivariate process mean, not many control charts are available for monitoring the covariance matrix. This paper presents a comprehensive overview of the literature on control charts for monitoring the covariance matrix in a multivariate statistical process monitoring (MSPM) framework. It classifies the research that has previously appeared in the literature. We highlight the challenging areas for research and provide some directions for future research.
△ Less
Submitted 14 April, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
A New Method for Performance Analysis in Nonlinear Dimensionality Reduction
Authors:
Jiaxi Liang,
Shojaeddin Chenouri,
Christopher G. Small
Abstract:
In this paper, we develop a local rank correlation measure which quantifies the performance of dimension reduction methods. The local rank correlation is easily interpretable, and robust against the extreme skewness of nearest neighbor distributions in high dimensions. Some benchmark datasets are studied. We find that the local rank correlation closely corresponds to our visual interpretation of t…
▽ More
In this paper, we develop a local rank correlation measure which quantifies the performance of dimension reduction methods. The local rank correlation is easily interpretable, and robust against the extreme skewness of nearest neighbor distributions in high dimensions. Some benchmark datasets are studied. We find that the local rank correlation closely corresponds to our visual interpretation of the quality of the output. In addition, we demonstrate that the local rank correlation is useful in estimating the intrinsic dimensionality of the original data, and in selecting a suitable value of tuning parameters used in some algorithms.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Optimal estimation in functional linear regression for sparse noise-contaminated data
Authors:
Behdad Mostafaiy,
MohammadReza FaridRohani,
Shojaeddin Chenouri
Abstract:
In this paper, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions of a common variable such as time. We consider the case that the response and the predictor processes are both sparsely sampled on random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the mea…
▽ More
In this paper, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions of a common variable such as time. We consider the case that the response and the predictor processes are both sparsely sampled on random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in the longitudinal data settings. To estimate the covariance and the cross-covariance functions we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross-covarinace function is used to obtain an estimate of the regression coefficient function and also functional singular components. We derive the convergence rates of the proposed cross-covariance, the regression coefficient and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a well known multicenter AIDS Cohort Study.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.