-
Online Changepoint Detection via Dynamic Mode Decomposition
Authors:
Victor K. Khamesi,
Niall M. Adams,
Dean A. Bodenham,
Edward A. K. Cohen
Abstract:
Detecting changes in data streams is a vital task in many applications. There is increasing interest in changepoint detection in the online setting, to enable real-time monitoring and support prompt responses and informed decision-making. Many approaches assume stationary sequences before encountering an abrupt change in the mean or variance. Notably less attention has focused on the challenging c…
▽ More
Detecting changes in data streams is a vital task in many applications. There is increasing interest in changepoint detection in the online setting, to enable real-time monitoring and support prompt responses and informed decision-making. Many approaches assume stationary sequences before encountering an abrupt change in the mean or variance. Notably less attention has focused on the challenging case where the monitored sequences exhibit trend, periodicity and seasonality. Dynamic mode decomposition is a data-driven dimensionality reduction technique that extracts the essential components of a dynamical system. We propose a changepoint detection method that leverages this technique to sequentially model the dynamics of a moving window of data and produce a low-rank reconstruction. A change is identified when there is a significant difference between this reconstruction and the observed data, and we provide theoretical justification for this approach. Extensive simulations demonstrate that our approach has superior detection performance compared to other methods for detecting small changes in mean, variance, periodicity, and second-order structure, among others, in data that exhibits seasonality. Results on real-world datasets also show excellent performance compared to contemporary approaches.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
MMD Two-sample Testing in the Presence of Arbitrarily Missing Data
Authors:
Yi** Zeng,
Niall M. Adams,
Dean A. Bodenham
Abstract:
In many real-world applications, it is common that a proportion of the data may be missing or only partially observed. We develop a novel two-sample testing method based on the Maximum Mean Discrepancy (MMD) which accounts for missing data in both samples, without making assumptions about the missingness mechanism. Our approach is based on deriving the mathematically precise bounds of the MMD test…
▽ More
In many real-world applications, it is common that a proportion of the data may be missing or only partially observed. We develop a novel two-sample testing method based on the Maximum Mean Discrepancy (MMD) which accounts for missing data in both samples, without making assumptions about the missingness mechanism. Our approach is based on deriving the mathematically precise bounds of the MMD test statistic after accounting for all possible missing values. To the best of our knowledge, it is the only two-sample testing method that is guaranteed to control the Type I error for both univariate and multivariate data where data may be arbitrarily missing. Simulation results show that our method has good statistical power, typically for cases where 5% to 10% of the data are missing. We highlight the value of our approach when the data are missing not at random, a context in which either ignoring the missing values or using common imputation methods may not control the Type I error.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
On two-sample testing for data with arbitrarily missing values
Authors:
Yi** Zeng,
Niall M. Adams,
Dean A. Bodenham
Abstract:
We develop a new rank-based approach for univariate two-sample testing in the presence of missing data which makes no assumptions about the missingness mechanism. This approach is a theoretical extension of the Wilcoxon-Mann-Whitney test that controls the Type I error by providing exact bounds for the test statistic after accounting for the number of missing values. Greater statistical power is sh…
▽ More
We develop a new rank-based approach for univariate two-sample testing in the presence of missing data which makes no assumptions about the missingness mechanism. This approach is a theoretical extension of the Wilcoxon-Mann-Whitney test that controls the Type I error by providing exact bounds for the test statistic after accounting for the number of missing values. Greater statistical power is shown when the method is extended to account for a bounded domain. Furthermore, exact bounds are provided on the proportions of data that can be missing in the two samples while yielding a significant result. Simulations demonstrate that our method has good power, typically for cases of $10\%$ to $20\%$ missing data, while standard imputation approaches fail to control the Type I error. We illustrate our method on complex clinical trial data in which patients' withdrawal from the trial lead to missing values.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.