-
Substitute adjustment via recovery of latent variables
Authors:
Jeffrey Adams,
Niels Richard Hansen
Abstract:
The deconfounder was proposed as a method for estimating causal parameters in a context with multiple causes and unobserved confounding. It is based on recovery of a latent variable from the observed causes. We disentangle the causal interpretation from the statistical estimation problem and show that the deconfounder in general estimates adjusted regression target parameters. It does so by outcom…
▽ More
The deconfounder was proposed as a method for estimating causal parameters in a context with multiple causes and unobserved confounding. It is based on recovery of a latent variable from the observed causes. We disentangle the causal interpretation from the statistical estimation problem and show that the deconfounder in general estimates adjusted regression target parameters. It does so by outcome regression adjusted for the recovered latent variable termed the substitute. We refer to the general algorithm, stripped of causal assumptions, as substitute adjustment. We give theoretical results to support that substitute adjustment estimates adjusted regression parameters when the regressors are conditionally independent given the latent variable. We also introduce a variant of our substitute adjustment algorithm that estimates an assumption-lean target parameter with minimal model assumptions. We then give finite sample bounds and asymptotic results supporting substitute adjustment estimation in the case where the latent variable takes values in a finite set. A simulation study illustrates finite sample properties of substitute adjustment. Our results support that when the latent variable model of the regressors hold, substitute adjustment is a viable method for adjusted regression.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Efficient adjustment for complex covariates: Gaining efficiency with DOPE
Authors:
Alexander Mangulad Christgau,
Niels Richard Hansen
Abstract:
Covariate adjustment is a ubiquitous method used to estimate the average treatment effect (ATE) from observational data. Assuming a known graphical structure of the data generating model, recent results give graphical criteria for optimal adjustment, which enables efficient estimation of the ATE. However, graphical approaches are challenging for high-dimensional and complex data, and it is not str…
▽ More
Covariate adjustment is a ubiquitous method used to estimate the average treatment effect (ATE) from observational data. Assuming a known graphical structure of the data generating model, recent results give graphical criteria for optimal adjustment, which enables efficient estimation of the ATE. However, graphical approaches are challenging for high-dimensional and complex data, and it is not straightforward to specify a meaningful graphical model of non-Euclidean data such as texts. We propose an general framework that accommodates adjustment for any subset of information expressed by the covariates. We generalize prior works and leverage these results to identify the optimal covariate information for efficient adjustment. This information is minimally sufficient for prediction of the outcome conditionally on treatment.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estimator (DOPE) for efficient estimation of the ATE, and we provide asymptotic results for the DOPE under general conditions. Compared to the augmented inverse propensity weighted (AIPW) estimator, the DOPE can retain its efficiency even when the covariates are highly predictive of treatment. We illustrate this with a single-index model, and with an implementation of the DOPE based on neural networks, we demonstrate its performance on simulated and real data. Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Identifiability in Continuous Lyapunov Models
Authors:
Philipp Dettling,
Roser Homs,
Carlos Améndola,
Mathias Drton,
Niels Richard Hansen
Abstract:
The recently introduced graphical continuous Lyapunov models provide a new approach to statistical modeling of correlated multivariate data. The models view each observation as a one-time cross-sectional snapshot of a multivariate dynamic process in equilibrium. The covariance matrix for the data is obtained by solving a continuous Lyapunov equation that is parametrized by the drift matrix of the…
▽ More
The recently introduced graphical continuous Lyapunov models provide a new approach to statistical modeling of correlated multivariate data. The models view each observation as a one-time cross-sectional snapshot of a multivariate dynamic process in equilibrium. The covariance matrix for the data is obtained by solving a continuous Lyapunov equation that is parametrized by the drift matrix of the dynamic process. In this context, different statistical models postulate different sparsity patterns in the drift matrix, and it becomes a crucial problem to clarify whether a given sparsity assumption allows one to uniquely recover the drift matrix parameters from the covariance matrix of the data. We study this identifiability problem by representing sparsity patterns by directed graphs. Our main result proves that the drift matrix is globally identifiable if and only if the graph for the sparsity pattern is simple (i.e., does not contain directed two-cycles). Moreover, we present a necessary condition for generic identifiability and provide a computational classification of small graphs with up to 5 nodes.
△ Less
Submitted 15 November, 2023; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Nonparametric Conditional Local Independence Testing
Authors:
Alexander Mangulad Christgau,
Lasse Petersen,
Niels Richard Hansen
Abstract:
Conditional local independence is an asymmetric independence relation among continuous time stochastic processes. It describes whether the evolution of one process is directly influenced by another process given the histories of additional processes, and it is important for the description and learning of causal relations among processes. We develop a model-free framework for testing the hypothesi…
▽ More
Conditional local independence is an asymmetric independence relation among continuous time stochastic processes. It describes whether the evolution of one process is directly influenced by another process given the histories of additional processes, and it is important for the description and learning of causal relations among processes. We develop a model-free framework for testing the hypothesis that a counting process is conditionally locally independent of another process. To this end, we introduce a new functional parameter called the Local Covariance Measure (LCM), which quantifies deviations from the hypothesis. Following the principles of double machine learning, we propose an estimator of the LCM and a test of the hypothesis using nonparametric estimators and sample splitting or cross-fitting. We call this test the (cross-fitted) Local Covariance Test ((X)-LCT), and we show that its level and power can be controlled uniformly, provided that the nonparametric estimators are consistent with modest rates. We illustrate the theory by an example based on a marginalized Cox model with time-dependent covariates, and we show in simulations that when double machine learning is used in combination with cross-fitting, then the test works well without restrictive parametric assumptions.
△ Less
Submitted 24 February, 2023; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Graphical modeling of stochastic processes driven by correlated errors
Authors:
Søren Wengel Mogensen,
Niels Richard Hansen
Abstract:
We study a class of graphs that represent local independence structures in stochastic processes allowing for correlated error processes. Several graphs may encode the same local independencies and we characterize such equivalence classes of graphs. In the worst case, the number of conditions in our characterizations grows superpolynomially as a function of the size of the node set in the graph. We…
▽ More
We study a class of graphs that represent local independence structures in stochastic processes allowing for correlated error processes. Several graphs may encode the same local independencies and we characterize such equivalence classes of graphs. In the worst case, the number of conditions in our characterizations grows superpolynomially as a function of the size of the node set in the graph. We show that deciding Markov equivalence is coNP-complete which suggests that our characterizations cannot be improved upon substantially. We prove a global Markov property in the case of a multivariate Ornstein-Uhlenbeck process which is driven by correlated Brownian motions.
△ Less
Submitted 11 September, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Testing Conditional Independence via Quantile Regression Based Partial Copulas
Authors:
Lasse Petersen,
Niels Richard Hansen
Abstract:
The partial copula provides a method for describing the dependence between two random variables $X$ and $Y$ conditional on a third random vector $Z$ in terms of nonparametric residuals $U_1$ and $U_2$. This paper develops a nonparametric test for conditional independence by combining the partial copula with a quantile regression based method for estimating the nonparametric residuals. We consider…
▽ More
The partial copula provides a method for describing the dependence between two random variables $X$ and $Y$ conditional on a third random vector $Z$ in terms of nonparametric residuals $U_1$ and $U_2$. This paper develops a nonparametric test for conditional independence by combining the partial copula with a quantile regression based method for estimating the nonparametric residuals. We consider a test statistic based on generalized correlation between $U_1$ and $U_2$ and derive its large sample properties under consistency assumptions on the quantile regression procedure. We demonstrate through a simulation study that the resulting test is sound under complicated data generating distributions. Moreover, in the examples considered the test is competitive to other state-of-the-art conditional independence tests in terms of level and power, and it has superior power in cases with conditional variance heterogeneity of $X$ and $Y$ given $Z$.
△ Less
Submitted 29 April, 2021; v1 submitted 29 March, 2020;
originally announced March 2020.
-
Markov equivalence of marginalized local independence graphs
Authors:
Søren Wengel Mogensen,
Niels Richard Hansen
Abstract:
Symmetric independence relations are often studied using graphical representations. Ancestral graphs or acyclic directed mixed graphs with $m$-separation provide classes of symmetric graphical independence models that are closed under marginalization. Asymmetric independence relations appear naturally for multivariate stochastic processes, for instance in terms of local independence. However, no c…
▽ More
Symmetric independence relations are often studied using graphical representations. Ancestral graphs or acyclic directed mixed graphs with $m$-separation provide classes of symmetric graphical independence models that are closed under marginalization. Asymmetric independence relations appear naturally for multivariate stochastic processes, for instance in terms of local independence. However, no class of graphs representing such asymmetric independence relations, which is also closed under marginalization, has been developed. We develop the theory of directed mixed graphs with $μ$-separation and show that this provides a graphical independence model class which is closed under marginalization and which generalizes previously considered graphical representations of local independence.
For statistical applications, it is pivotal to characterize graphs that induce the same independence relations as such a Markov equivalence class of graphs is the object that is ultimately identifiable from observational data. Our main result is that for directed mixed graphs with $μ$-separation each Markov equivalence class contains a maximal element which can be constructed from the independence relations alone. Moreover, we introduce the directed mixed equivalence graph as the maximal graph with edge markings. This graph encodes all the information about the edges that is identifiable from the independence relations, and furthermore it can be computed efficiently from the maximal graph.
△ Less
Submitted 11 February, 2019; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Learning Large Scale Ordinary Differential Equation Systems
Authors:
Frederik Vissing Mikkelsen,
Niels Richard Hansen
Abstract:
Learning large scale nonlinear ordinary differential equation (ODE) systems from data is known to be computationally and statistically challenging. We present a framework together with the adaptive integral matching (AIM) algorithm for learning polynomial or rational ODE systems with a sparse network structure. The framework allows for time course data sampled from multiple environments representi…
▽ More
Learning large scale nonlinear ordinary differential equation (ODE) systems from data is known to be computationally and statistically challenging. We present a framework together with the adaptive integral matching (AIM) algorithm for learning polynomial or rational ODE systems with a sparse network structure. The framework allows for time course data sampled from multiple environments representing e.g. different interventions or perturbations of the system. The algorithm AIM combines an initial penalised integral matching step with an adapted least squares step based on solving the ODE numerically. The R package episode implements AIM together with several other algorithms and is available from CRAN. It is shown that AIM achieves state-of-the-art network recovery for the in silico phosphoprotein abundance data from the eighth DREAM challenge with an AUROC of 0.74, and it is demonstrated via a range of numerical examples that AIM has good statistical properties while being computationally feasible even for large systems.
△ Less
Submitted 26 October, 2017; v1 submitted 25 October, 2017;
originally announced October 2017.
-
A comment on Stein's unbiased risk estimate for reduced rank estimators
Authors:
Niels Richard Hansen
Abstract:
In the framework of matrix valued observables with low rank means, Stein's unbiased risk estimate (SURE) can be useful for risk estimation and for tuning the amount of shrinkage towards low rank matrices. This was demonstrated by Candès et al. (2013) for singular value soft thresholding, which is a Lipschitz continuous estimator. SURE provides an unbiased risk estimate for an estimator whenever th…
▽ More
In the framework of matrix valued observables with low rank means, Stein's unbiased risk estimate (SURE) can be useful for risk estimation and for tuning the amount of shrinkage towards low rank matrices. This was demonstrated by Candès et al. (2013) for singular value soft thresholding, which is a Lipschitz continuous estimator. SURE provides an unbiased risk estimate for an estimator whenever the differentiability requirements for Stein's lemma are satisfied. Lipschitz continuity of the estimator is sufficient, but it is emphasized that differentiability Lebesgue almost everywhere isn't. The reduced rank estimator, which gives the best approximation of the observation with a fixed rank, is an example of a discontinuous estimator for which Stein's lemma actually applies. This was observed by Mukherjee et al. (2015), but the proof was incomplete. This brief note gives a sufficient condition for Stein's lemma to hold for estimators with discontinuities, which is then shown to be fulfilled for a class of spectral function estimators including the reduced rank estimator. Singular value hard thresholding does, however, not satisfy the condition, and Stein's lemma does not apply to this estimator.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.
-
Degrees of Freedom for Piecewise Lipschitz Estimators
Authors:
Frederik Riis Mikkelsen,
Niels Richard Hansen
Abstract:
A representation of the degrees of freedom akin to Stein's lemma is given for a class of estimators of a mean value parameter in $\mathbb{R}^n$. Contrary to previous results our representation holds for a range of discontinues estimators. It shows that even though the discontinuities form a Lebesgue null set, they cannot be ignored when computing degrees of freedom. Estimators with discontinuities…
▽ More
A representation of the degrees of freedom akin to Stein's lemma is given for a class of estimators of a mean value parameter in $\mathbb{R}^n$. Contrary to previous results our representation holds for a range of discontinues estimators. It shows that even though the discontinuities form a Lebesgue null set, they cannot be ignored when computing degrees of freedom. Estimators with discontinuities arise naturally in regression if data driven variable selection is used. Two such examples, namely best subset selection and lasso-OLS, are considered in detail in this paper. For lasso-OLS the general representation leads to an estimate of the degrees of freedom based on the lasso solution path, which in turn can be used for estimating the risk of lasso-OLS. A similar estimate is proposed for best subset selection. The usefulness of the risk estimates for selecting the number of variables is demonstrated via simulations with a particular focus on lasso-OLS.
△ Less
Submitted 10 February, 2017; v1 submitted 14 January, 2016;
originally announced January 2016.
-
Degrees of freedom for nonlinear least squares estimation
Authors:
Niels Richard Hansen,
Alexander Sokol
Abstract:
We give a general result on the effective degrees of freedom for nonlinear least squares estimation, which relates the degrees of freedom to the divergence of the estimator. We show that in a general framework, the divergence of the least squares estimator is a well defined but potentially negatively biased estimate of the degrees of freedom, and we give an exact representation of the bias. This i…
▽ More
We give a general result on the effective degrees of freedom for nonlinear least squares estimation, which relates the degrees of freedom to the divergence of the estimator. We show that in a general framework, the divergence of the least squares estimator is a well defined but potentially negatively biased estimate of the degrees of freedom, and we give an exact representation of the bias. This implies that if we use the divergence as a plug-in estimate of the degrees of freedom in Stein's unbiased risk estimate (SURE), we generally underestimate the true risk. Our result applies, for instance, to model searching problems, yielding a finite sample characterization of how much the search contributes to the degrees of freedom. Motivated by the problem of fitting ODE models in systems biology, the general results are illustrated by the estimation of systems of linear ODEs. In this example the divergence turns out to be a useful estimate of degrees of freedom for $\ell_1$-constrained models.
△ Less
Submitted 12 December, 2014; v1 submitted 12 February, 2014;
originally announced February 2014.
-
Causal interpretation of stochastic differential equations
Authors:
Alexander Sokol,
Niels Richard Hansen
Abstract:
We give a causal interpretation of stochastic differential equations (SDEs) by defining the postintervention SDE resulting from an intervention in an SDE. We show that under Lipschitz conditions, the solution to the postintervention SDE is equal to a uniform limit in probability of postintervention structural equation models based on the Euler scheme of the original SDE, thus relating our definiti…
▽ More
We give a causal interpretation of stochastic differential equations (SDEs) by defining the postintervention SDE resulting from an intervention in an SDE. We show that under Lipschitz conditions, the solution to the postintervention SDE is equal to a uniform limit in probability of postintervention structural equation models based on the Euler scheme of the original SDE, thus relating our definition to mainstream causal concepts. We prove that when the driving noise in the SDE is a Lévy process, the postintervention distribution is identifiable from the generator of the SDE.
△ Less
Submitted 27 October, 2014; v1 submitted 31 March, 2013;
originally announced April 2013.
-
Lasso and probabilistic inequalities for multivariate point processes
Authors:
Niels Richard Hansen,
Patricia Reynaud-Bouret,
Vincent Rivoirard
Abstract:
Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select coefficients, we propose an adaptive $\ell_1$-penalization methodology, where data-driven weights of th…
▽ More
Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select coefficients, we propose an adaptive $\ell_1$-penalization methodology, where data-driven weights of the penalty are derived from new Bernstein type inequalities for martingales. Oracle inequalities are established under assumptions on the Gram matrix of the dictionary. Nonasymptotic probabilistic results for multivariate Hawkes processes are proven, which allows us to check these assumptions by considering general dictionaries based on histograms, Fourier or wavelet bases. Motivated by problems of neuronal activity inference, we finally carry out a simulation study for multivariate Hawkes processes and compare our methodology with the adaptive Lasso procedure proposed by Zou in (J. Amer. Statist. Assoc. 101 (2006) 1418-1429). We observe an excellent behavior of our procedure. We rely on theoretical aspects for the essential question of tuning our methodology. Unlike adaptive Lasso of (J. Amer. Statist. Assoc. 101 (2006) 1418-1429), our tuning procedure is proven to be robust with respect to all the parameters of the problem, revealing its potential for concrete purposes, in particular in neuroscience.
△ Less
Submitted 7 April, 2015; v1 submitted 2 August, 2012;
originally announced August 2012.
-
Exponential martingales and changes of measure for counting processes
Authors:
Alexander Sokol,
Niels Richard Hansen
Abstract:
We give sufficient criteria for the Doléans-Dade exponential of a stochastic integral with respect to a counting process local martingale to be a true martingale. The criteria are adapted particularly to the case of counting processes and are sufficiently weak to be useful and verifiable, as we illustrate by several examples. In particular, the criteria allow for the construction of for example no…
▽ More
We give sufficient criteria for the Doléans-Dade exponential of a stochastic integral with respect to a counting process local martingale to be a true martingale. The criteria are adapted particularly to the case of counting processes and are sufficiently weak to be useful and verifiable, as we illustrate by several examples. In particular, the criteria allow for the construction of for example nonexplosive Hawkes processes as well as counting processes with stochastic intensities depending on diffusion processes.
△ Less
Submitted 13 December, 2014; v1 submitted 10 May, 2012;
originally announced May 2012.
-
Penalized maximum likelihood estimation for generalized linear point processes
Authors:
Niels Richard Hansen
Abstract:
A generalized linear point process is specified in terms of an intensity that depends upon a linear predictor process through a fixed non-linear function. We present a framework where the linear predictor is parametrized by a Banach space and give results on Gateaux differentiability of the log-likelihood. Of particular interest is when the intensity is expressed in terms of a linear filter parame…
▽ More
A generalized linear point process is specified in terms of an intensity that depends upon a linear predictor process through a fixed non-linear function. We present a framework where the linear predictor is parametrized by a Banach space and give results on Gateaux differentiability of the log-likelihood. Of particular interest is when the intensity is expressed in terms of a linear filter parametrized by a Sobolev space. Using that the Sobolev spaces are reproducing kernel Hilbert spaces we derive results on the representation of the penalized maximum likelihood estimator in a special case and the gradient of the negative log-likelihood in general. The latter is used to develop a descent algorithm in the Sobolev space. We conclude the paper by extensions to multivariate and additive model specifications. The methods are implemented in the R-package ppstat.
△ Less
Submitted 2 April, 2013; v1 submitted 3 March, 2010;
originally announced March 2010.
-
Local alignment of Markov chains
Authors:
Niels Richard Hansen
Abstract:
We consider local alignments without gaps of two independent Markov chains from a finite alphabet, and we derive sufficient conditions for the number of essentially different local alignments with a score exceeding a high threshold to be asymptotically Poisson distributed. From the Poisson approximation a Gumbel approximation of the maximal local alignment score is obtained. The results extend t…
▽ More
We consider local alignments without gaps of two independent Markov chains from a finite alphabet, and we derive sufficient conditions for the number of essentially different local alignments with a score exceeding a high threshold to be asymptotically Poisson distributed. From the Poisson approximation a Gumbel approximation of the maximal local alignment score is obtained. The results extend those obtained by Dembo, Karlin and Zeitouni [Ann. Probab. 22 (1994) 2022--2039] for independent sequences of i.i.d. variables.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.
-
The maximum of a random walk reflected at a general barrier
Authors:
Niels Richard Hansen
Abstract:
We define the reflection of a random walk at a general barrier and derive, in case the increments are light tailed and have negative mean, a necessary and sufficient criterion for the global maximum of the reflected process to be finite a.s. If it is finite a.s., we show that the tail of the distribution of the global maximum decays exponentially fast and derive the precise rate of decay. Finall…
▽ More
We define the reflection of a random walk at a general barrier and derive, in case the increments are light tailed and have negative mean, a necessary and sufficient criterion for the global maximum of the reflected process to be finite a.s. If it is finite a.s., we show that the tail of the distribution of the global maximum decays exponentially fast and derive the precise rate of decay. Finally, we discuss an example from structural biology that motivated the interest in the reflection at a general barrier.
△ Less
Submitted 9 March, 2006;
originally announced March 2006.