Search | arXiv e-print repository

arXiv:2405.15740 [pdf, other]

On Flexible Inverse Probability of Treatment and Intensity Weighting: Informative Censoring, Variable Inclusion, and Weight Trimming

Authors: Grace Tompkins, Joel A Dubin, Michael Wallace

Abstract: Many observational studies feature irregular longitudinal data, where the observation times are not common across individuals in the study. Further, the observation times may be related to the longitudinal outcome. In this setting, failing to account for the informative observation process may result in biased causal estimates. This can be coupled with other sources of bias, including non-randomiz… ▽ More Many observational studies feature irregular longitudinal data, where the observation times are not common across individuals in the study. Further, the observation times may be related to the longitudinal outcome. In this setting, failing to account for the informative observation process may result in biased causal estimates. This can be coupled with other sources of bias, including non-randomized treatment assignments and informative censoring. This paper provides an overview of a flexible weighting method used to adjust for informative observation processes and non-randomized treatment assignments. We investigate the sensitivity of the flexible weighting method to violations of the noninformative censoring assumption, examine variable selection for the observation process weighting model, known as inverse intensity weighting, and look at the impacts of weight trimming for the flexible weighting model. We show that the flexible weighting method is sensitive to violations of the noninformative censoring assumption and show that a previously proposed extension fails under such violations. We also show that variables confounding the observation and outcome processes should always be included in the observation intensity model. Finally, we show that weight trimming should be applied in the flexible weighting model when the treatment assignment process is highly informative and driving the extreme weights. We conclude with an application of the methodology to a real data set to examine the impacts of household water sources on malaria diagnoses. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 20 pages and 2 figures in main document, 8 pages and 2 figures in Supplemental Material. Submitted to Statistical Methods in Medical Research

arXiv:2402.12555 [pdf, other]

Optimal Dynamic Treatment Regime Estimation in the Presence of Nonadherence

Authors: Dylan Spicker, Michael P. Wallace, Grace Y. Yi

Abstract: Dynamic treatment regimes (DTRs) are sequences of functions that formalize the process of precision medicine. DTRs take as input patient information and output treatment recommendations. A major focus of the DTR literature has been on the estimation of optimal DTRs, the sequences of decision rules that result in the best outcome in expectation, across the complete population were they to be applie… ▽ More Dynamic treatment regimes (DTRs) are sequences of functions that formalize the process of precision medicine. DTRs take as input patient information and output treatment recommendations. A major focus of the DTR literature has been on the estimation of optimal DTRs, the sequences of decision rules that result in the best outcome in expectation, across the complete population were they to be applied. While there is a rich literature on optimal DTR estimation, to date there has been minimal consideration of the impacts of nonadherence on these estimation procedures. Nonadherence refers to any process through that an individual's prescribed treatment does not match their true treatment. We explore the impacts of nonadherence and demonstrate that generally, when nonadherence is ignored, suboptimal regimes will be estimated. In light of these findings we propose a method for estimating optimal DTRs in the presence of nonadherence. The resulting estimators are consistent and asymptotically normal, with a double robustness property. Using simulations we demonstrate the reliability of these results, and illustrate comparable performance between the proposed estimation procedure adjusting for the impacts of nonadherence and estimators that are computed on data without nonadherence. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2311.09338 [pdf, other]

Challenges for Predictive Modeling with Neural Network Techniques using Error-Prone Dietary Intake Data

Authors: Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon I. Kirkpatrick, Michael Wallace, Kevin W. Dodd

Abstract: Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models… ▽ More Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling has not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2306.12865 [pdf, other]

Estimating dynamic treatment regimes for ordinal outcomes with household interference: Application in household smoking cessation

Authors: Cong Jiang, Mary Thompson, Michael Wallace

Abstract: The focus of precision medicine is on decision support, often in the form of dynamic treatment regimes (DTRs), which are sequences of decision rules. At each decision point, the decision rules determine the next treatment according to the patient's baseline characteristics, the information on treatments and responses accrued by that point, and the patient's current health status, including symptom… ▽ More The focus of precision medicine is on decision support, often in the form of dynamic treatment regimes (DTRs), which are sequences of decision rules. At each decision point, the decision rules determine the next treatment according to the patient's baseline characteristics, the information on treatments and responses accrued by that point, and the patient's current health status, including symptom severity and other measures. However, DTR estimation with ordinal outcomes is rarely studied, and rarer still in the context of interference - where one patient's treatment may affect another's outcome. In this paper, we introduce the weighted proportional odds model (WPOM): a regression-based, approximate doubly-robust approach to single-stage DTR estimation for ordinal outcomes. This method also accounts for the possibility of interference between individuals sharing a household through the use of covariate balancing weights derived from joint propensity scores. Examining different types of balancing weights, we verify the approximate double robustness of WPOM with our adjusted weights via simulation studies. We further extend WPOM to multi-stage DTR estimation with household interference, namely dWPOM (dynamic WPOM). Lastly, we demonstrate our proposed methodology in the analysis of longitudinal survey data from the Population Assessment of Tobacco and Health study, which motivates this work. Furthermore, considering interference, we provide optimal treatment strategies for households to achieve smoking cessation of the pair in the household. △ Less

Submitted 20 December, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

arXiv:2206.11465 [pdf, ps, other]

Quantifying Distances Between Clusters with Elliptical or Non-Elliptical Shapes

Authors: Meredith L. Wallace, Lisa McTeague, Jessica L. Graves, Nicholas Kissel, Cristina Tortora, Bradley Wheeler, Satish Iyengar

Abstract: Finite mixture models that allow for a broad range of potentially non-elliptical cluster distributions is an emerging methodological field. Such methods allow for the shape of the clusters to match the natural heterogeneity of the data, rather than forcing a series of elliptical clusters. These methods are highly relevant for clustering continuous non-normal data - a common occurrence with objecti… ▽ More Finite mixture models that allow for a broad range of potentially non-elliptical cluster distributions is an emerging methodological field. Such methods allow for the shape of the clusters to match the natural heterogeneity of the data, rather than forcing a series of elliptical clusters. These methods are highly relevant for clustering continuous non-normal data - a common occurrence with objective data that are now routinely captured in health research. However, interpreting and comparing such models - especially with regards to whether they produce meaningful clusters that are reasonably well separated - is non-trivial. We summarize several measures that can succinctly quantify the multivariate distance between two clusters, regardless of the cluster distribution, and suggest practical computational tools. Through a simulation study, we evaluate these measures across three scenarios that allow for clusters to differ in mean, scale, and rotation. We then demonstrate our approaches using physiological responses to emotional imagery captured as part of the Transdiagnostic Anxiety Study, a large-scale study of anxiety disorder spectrum patients and control participants. Finally, we synthesize findings to provide guidance on how to use distance measures in clustering applications. △ Less

Submitted 22 June, 2022; originally announced June 2022.

arXiv:2203.08269 [pdf, other]

Doubly-Robust Dynamic Treatment Regimen Estimation for Binary Outcomes

Authors: Cong Jiang, Michael Wallace, Mary Thompson

Abstract: In precision medicine, Dynamic Treatment Regimes (DTRs) are treatment protocols that adapt over time in response to a patient's observed characteristics. A DTR is a set of decision functions that takes an individual patient's information as arguments and outputs an action to be taken. Building on observed data, the aim is to identify the DTR that optimizes expected patient outcomes. Multiple metho… ▽ More In precision medicine, Dynamic Treatment Regimes (DTRs) are treatment protocols that adapt over time in response to a patient's observed characteristics. A DTR is a set of decision functions that takes an individual patient's information as arguments and outputs an action to be taken. Building on observed data, the aim is to identify the DTR that optimizes expected patient outcomes. Multiple methods have been proposed for optimal DTR estimation with continuous outcomes. However, optimal DTR estimation with binary outcomes is more complicated and has received comparatively little attention. Solving a system of weighted generalized estimating equations, we propose a new balancing weight criterion to overcome the misspecification of generalized linear models' nuisance components. We construct binary pseudo-outcomes, and develop a doubly-robust and easy-to-use method to estimate an optimal DTR with binary outcomes. We also outline the underlying theory, which relies on the balancing property of the weights; provide simulation studies that verify the double-robustness of our method; and illustrate the method in studying the effects of e-cigarette usage on smoking cessation, using observational data from the Population Assessment of Tobacco and Health (PATH) study. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.02204 [pdf, other]

Sharper Bounds for Proximal Gradient Algorithms with Errors

Authors: Anis Hamadouche, Yun Wu, Andrew M. Wallace, Joao F. C. Mota

Abstract: We analyse the convergence of the proximal gradient algorithm for convex composite problems in the presence of gradient and proximal computational inaccuracies. We derive new tighter deterministic and probabilistic bounds that we use to verify a simulated (MPC) and a synthetic (LASSO) optimization problems solved on a reduced-precision machine in combination with an inaccurate proximal operator. W… ▽ More We analyse the convergence of the proximal gradient algorithm for convex composite problems in the presence of gradient and proximal computational inaccuracies. We derive new tighter deterministic and probabilistic bounds that we use to verify a simulated (MPC) and a synthetic (LASSO) optimization problems solved on a reduced-precision machine in combination with an inaccurate proximal operator. We also show how the probabilistic bounds are more robust for algorithm verification and more accurate for application performance guarantees. Under some statistical assumptions, we also prove that some cumulative error terms follow a martingale property. And conforming to observations, e.g., in \cite{schmidt2011convergence}, we also show how the acceleration of the algorithm amplifies the gradient and proximal computational errors. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2111.02863 [pdf, other]

doi 10.1002/cjs.11777

Nonparametric Simulation Extrapolation for Measurement Error Models

Authors: Dylan Spicker, Michael Wallace, Grace Yi

Abstract: The presence of measurement error is a widespread issue which, when ignored, can render the results of an analysis unreliable. Numerous corrections for the effects of measurement error have been proposed and studied, often under the assumption of a normally distributed, additive measurement error model. One such method is simulation extrapolation, or SIMEX. In many situations observed data are non… ▽ More The presence of measurement error is a widespread issue which, when ignored, can render the results of an analysis unreliable. Numerous corrections for the effects of measurement error have been proposed and studied, often under the assumption of a normally distributed, additive measurement error model. One such method is simulation extrapolation, or SIMEX. In many situations observed data are non-symmetric, heavy-tailed, or otherwise highly non-normal. In these settings, correction techniques relying on the assumption of normality are undesirable. We propose an extension to the simulation extrapolation method which is nonparametric in the sense that no specific distributional assumptions are required on the error terms. The technique is implemented when either validation data or replicate measurements are available, and is designed to be immediately accessible for those familiar with simulation extrapolation. △ Less

Submitted 20 March, 2023; v1 submitted 4 November, 2021; originally announced November 2021.

MSC Class: 62G05; 62P10

arXiv:2106.07401 [pdf, other]

Generalizations to Corrections for the Effects of Measurement Error in Approximately Consistent Methodologies

Authors: Dylan Spicker, Michael P Wallace, Grace Y Yi

Abstract: Measurement error is a pervasive issue which renders the results of an analysis unreliable. The measurement error literature contains numerous correction techniques, which can be broadly divided into those which aim to produce exactly consistent estimators, and those which are only approximately consistent. While consistency is a desirable property, it is typically attained only under specific mod… ▽ More Measurement error is a pervasive issue which renders the results of an analysis unreliable. The measurement error literature contains numerous correction techniques, which can be broadly divided into those which aim to produce exactly consistent estimators, and those which are only approximately consistent. While consistency is a desirable property, it is typically attained only under specific model assumptions. Two techniques, regression calibration and simulation extrapolation, are used frequently in a wide variety of parametric and semiparametric settings. However, in many settings these methods are only approximately consistent. We generalize these corrections, relaxing assumptions placed on replicate measurements. Under regularity conditions, the estimators are shown to be asymptotically normal, with a sandwich estimator for the asymptotic variance. Through simulation, we demonstrate the improved performance of the modified estimators, over the standard techniques, when these assumptions are violated. We motivate these corrections using the Framingham Heart Study, and apply the generalized techniques to an analysis of these data. △ Less

Submitted 5 November, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2001.03305 [pdf, other]

Diagnosing Colorectal Polyps in the Wild with Capsule Networks

Authors: Rodney LaLonde, Pujan Kandel, Concetto Spampinato, Michael B. Wallace, Ulas Bagci

Abstract: Colorectal cancer, largely arising from precursor lesions called polyps, remains one of the leading causes of cancer-related death worldwide. Current clinical standards require the resection and histopathological analysis of polyps due to test accuracy and sensitivity of optical biopsy methods falling substantially below recommended levels. In this study, we design a novel capsule network architec… ▽ More Colorectal cancer, largely arising from precursor lesions called polyps, remains one of the leading causes of cancer-related death worldwide. Current clinical standards require the resection and histopathological analysis of polyps due to test accuracy and sensitivity of optical biopsy methods falling substantially below recommended levels. In this study, we design a novel capsule network architecture (D-Caps) to improve the viability of optical biopsy of colorectal polyps. Our proposed method introduces several technical novelties including a novel capsule architecture with a capsule-average pooling (CAP) method to improve efficiency in large-scale image classification. We demonstrate improved results over the previous state-of-the-art convolutional neural network (CNN) approach by as much as 43%. This work provides an important benchmark on the new Mayo Polyp dataset, a significantly more challenging and larger dataset than previous polyp studies, with results stratified across all available categories, imaging devices and modalities, and focus modes to promote future direction into AI-driven colorectal cancer screening systems. Code is publicly available at https://github.com/lalonderodney/D-Caps . △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: Accepted for publication at ISBI 2020 (IEEE International Symposium on Biomedical Imaging). Code is publicly available at https://github.com/lalonderodney/D-Caps

arXiv:1907.11659 [pdf, other]

doi 10.1002/sim.8690

Measurement error and precision medicine: error-prone tailoring covariates in dynamic treatment regimes

Authors: Dylan Spicker, Michael Wallace

Abstract: Precision medicine incorporates patient-level covariates to tailor treatment decisions, seeking to improve outcomes. In longitudinal studies with time-varying covariates and sequential treatment decisions, precision medicine can be formalized with dynamic treatment regimes (DTRs): sequences of covariate-dependent treatment rules. To date, the precision medicine literature has not addressed a ubiqu… ▽ More Precision medicine incorporates patient-level covariates to tailor treatment decisions, seeking to improve outcomes. In longitudinal studies with time-varying covariates and sequential treatment decisions, precision medicine can be formalized with dynamic treatment regimes (DTRs): sequences of covariate-dependent treatment rules. To date, the precision medicine literature has not addressed a ubiquitous concern in health research - measurement error - where observed data deviate from the truth. We discuss the consequences of ignoring measurement error in the context of DTRs, focusing on challenges unique to precision medicine. We show - through simulation and theoretical results - that relatively simple measurement error correction techniques can lead to substantial improvements over uncorrected analyses, and apply these findings to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. △ Less

Submitted 15 June, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

arXiv:1907.00437 [pdf, other]

INN: Inflated Neural Networks for IPMN Diagnosis

Authors: Rodney LaLonde, Irene Tanner, Katerina Nikiforaki, Georgios Z. Papadakis, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, Ulas Bagci

Abstract: Intraductal papillary mucinous neoplasm (IPMN) is a precursor to pancreatic ductal adenocarcinoma. While over half of patients are diagnosed with pancreatic cancer at a distant stage, patients who are diagnosed early enjoy a much higher 5-year survival rate of $34\%$ compared to $3\%$ in the former; hence, early diagnosis is key. Unique challenges in the medical imaging domain such as extremely li… ▽ More Intraductal papillary mucinous neoplasm (IPMN) is a precursor to pancreatic ductal adenocarcinoma. While over half of patients are diagnosed with pancreatic cancer at a distant stage, patients who are diagnosed early enjoy a much higher 5-year survival rate of $34\%$ compared to $3\%$ in the former; hence, early diagnosis is key. Unique challenges in the medical imaging domain such as extremely limited annotated data sets and typically large 3D volumetric data have made it difficult for deep learning to secure a strong foothold. In this work, we construct two novel "inflated" deep network architectures, $\textit{InceptINN}$ and $\textit{DenseINN}$, for the task of diagnosing IPMN from multisequence (T1 and T2) MRI. These networks inflate their 2D layers to 3D and bootstrap weights from their 2D counterparts (Inceptionv3 and DenseNet121 respectively) trained on ImageNet to the new 3D kernels. We also extend the inflation process by further expanding the pre-trained kernels to handle any number of input modalities and different fusion strategies. This is one of the first studies to train an end-to-end deep network on multisequence MRI for IPMN diagnosis, and shows that our proposed novel inflated network architectures are able to handle the extremely limited training data (139 MRI scans), while providing an absolute improvement of $8.76\%$ in accuracy for diagnosing IPMN over the current state-of-the-art. Code is publicly available at https://github.com/lalonderodney/INN-Inflated-Neural-Nets. △ Less

Submitted 30 June, 2019; originally announced July 2019.

Comments: Accepted for publication at MICCAI 2019 (22nd International Conference on Medical Image Computing and Computer Assisted Intervention). Code is publicly available at https://github.com/lalonderodney/INN-Inflated-Neural-Nets

arXiv:1704.08229 [pdf, ps, other]

Generalized G-estimation and Model Selection

Authors: M. P. Wallace, E. E. M. Moodie, D. A. Stephens

Abstract: Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extend… ▽ More Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extending G-estimation to the case of non-additive effects, non-continuous outcomes or on model selection. We demonstrate how G-estimation can be more widely applied through the use of iteratively-reweighted least squares procedures, and illustrate this for log-linear models. We then derive a quasi-likelihood function for G-estimation within the DTR framework, and show how it can be used to form an information criterion for blip model selection. These developments are demonstrated through application to a variety of simulation studies as well as data from the Sequenced Treatment Alternatives to Relieve Depression study. △ Less

Submitted 26 April, 2017; originally announced April 2017.

arXiv:1602.05264 [pdf, ps, other]

Anomaly Detection in Clutter using Spectrally Enhanced Ladar

Authors: Puneet S Chhabra, Andrew M Wallace, James R Hopgood

Abstract: Discrete return (DR) Laser Detection and Ranging (Ladar) systems provide a series of echoes that reflect from objects in a scene. These can be first, last or multi-echo returns. In contrast, Full-Waveform (FW)-Ladar systems measure the intensity of light reflected from objects continuously over a period of time. In a camouflaged scenario, e.g., objects hidden behind dense foliage, a FW-Ladar penet… ▽ More Discrete return (DR) Laser Detection and Ranging (Ladar) systems provide a series of echoes that reflect from objects in a scene. These can be first, last or multi-echo returns. In contrast, Full-Waveform (FW)-Ladar systems measure the intensity of light reflected from objects continuously over a period of time. In a camouflaged scenario, e.g., objects hidden behind dense foliage, a FW-Ladar penetrates such foliage and returns a sequence of echoes including buried faint echoes. The aim of this paper is to learn local-patterns of co-occurring echoes characterised by their measured spectra. A deviation from such patterns defines an abnormal event in a forest/tree depth profile. As far as the authors know, neither DR or FW-Ladar, along with several spectral measurements, has not been applied to anomaly detection. This work presents an algorithm that allows detection of spectral and temporal anomalies in FW-Multi Spectral Ladar (FW-MSL) data samples. An anomaly is defined as a full waveform temporal and spectral signature that does not conform to a prior expectation, represented using a learnt subspace (dictionary) and set of coefficients that capture co-occurring local-patterns using an overlap** temporal window. A modified optimization scheme is proposed for subspace learning based on stochastic approximations. The objective function is augmented with a discriminative term that represents the subspace's separability properties and supports anomaly characterisation. The algorithm detects several man-made objects and anomalous spectra hidden in a dense clutter of vegetation and also allows tree species classification. △ Less

Submitted 16 February, 2016; originally announced February 2016.

Showing 1–14 of 14 results for author: Wallace, M