-
Real-Time Estimation of COVID-19 Infections: Deconvolution and Sensor Fusion
Authors:
Maria Jahja,
Andrew Chin,
Ryan J. Tibshirani
Abstract:
We propose, implement, and evaluate a method to estimate the daily number of new symptomatic COVID-19 infections, at the level of individual U.S. counties, by deconvolving daily reported COVID-19 case counts using an estimated symptom-onset-to-case-report delay distribution. Importantly, we focus on estimating infections in real-time (rather than retrospectively), which poses numerous challenges.…
▽ More
We propose, implement, and evaluate a method to estimate the daily number of new symptomatic COVID-19 infections, at the level of individual U.S. counties, by deconvolving daily reported COVID-19 case counts using an estimated symptom-onset-to-case-report delay distribution. Importantly, we focus on estimating infections in real-time (rather than retrospectively), which poses numerous challenges. To address these, we develop new methodology for both the distribution estimation and deconvolution steps, and we employ a sensor fusion layer (which fuses together predictions from models that are trained to track infections based on auxiliary surveillance streams) in order to improve accuracy and stability.
△ Less
Submitted 27 February, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens
Authors:
Palash Ghosh,
Trikay Nalamada,
Shruti Agarwal,
Maria Jahja,
Bibhas Chakraborty
Abstract:
A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-lea…
▽ More
A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition in which Q-shared fails. Leveraging properties from expansion-constrained ordinary least-squares, we give a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.
△ Less
Submitted 26 May, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights
Authors:
Maria Jahja,
David C. Farrow,
Roni Rosenfeld,
Ryan J. Tibshirani
Abstract:
The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a "flat prior") and an augmented measurement space. This reformulation -- which we ref…
▽ More
The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a "flat prior") and an augmented measurement space. This reformulation -- which we refer to as augmented measurement sensor fusion (SF) -- is conceptually interesting, because the transformed system here is seemingly static (as there is effectively no process model), but we can still capture the state dynamics inherent to the KF by folding the process model into the measurement space. Further, this reformulation of the KF turns out to be useful in settings in which past states are observed eventually (at some lag). Here, when the measurement noise covariance is estimated by the empirical covariance, we show that the state predictions from SF are equivalent to those from a regression of past states on past measurements, subject to particular linear constraints (reflecting the relationships encoded in the measurement map). This allows us to port standard ideas (say, regularization methods) in regression over to dynamical systems. For example, we can posit multiple candidate process models, fold all of them into the measurement model, transform to the regression perspective, and apply $\ell_1$ penalization to perform process model selection. We give various empirical demonstrations, and focus on an application to nowcasting the weekly incidence of influenza in the US.
△ Less
Submitted 2 August, 2021; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Sampling Method for Fast Training of Support Vector Data Description
Authors:
Arin Chaudhuri,
Deovrat Kakde,
Maria Jahja,
Wei Xiao,
Hansi Jiang,
Seunghyun Kong,
Sergiy Peredriy
Abstract:
Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each…
▽ More
Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each iteration by computing SVDD on an independent random sample selected with replacement from the training data set. The experimental results indicate that the proposed method is extremely fast and provides a good data description .
△ Less
Submitted 25 September, 2016; v1 submitted 16 June, 2016;
originally announced June 2016.
-
Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description
Authors:
Deovrat Kakde,
Arin Chaudhuri,
Seunghyun Kong,
Maria Jahja,
Hansi Jiang,
Jorge Silva
Abstract:
Support Vector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the da…
▽ More
Support Vector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the data boundary changes from spherical to wiggly. The spherical data boundary leads to underfitting, and an extremely wiggly data boundary leads to overfitting. In this paper, we propose empirical criterion to obtain good values of the Gaussian kernel bandwidth parameter. This criterion provides a smooth boundary that captures the essential geometric features of the data.
△ Less
Submitted 8 August, 2017; v1 submitted 16 February, 2016;
originally announced February 2016.