Search | arXiv e-print repository

Learning from landmarks, curves, surfaces, and shapes in Geomstats

Authors: Luís F. Pereira, Alice Le Brigant, Adele Myers, Emmanuel Hartman, Amil Khan, Malik Tuerkoen, Trey Dold, Mengyang Gu, Pablo Suárez-Serrato, Nina Miolane

Abstract: We introduce the shape module of the Python package Geomstats to analyze shapes of objects represented as landmarks, curves and surfaces across fields of natural sciences and engineering. The shape module first implements widely used shape spaces, such as the Kendall shape space, as well as elastic spaces of discrete curves and surfaces. The shape module further implements the abstract mathematica… ▽ More We introduce the shape module of the Python package Geomstats to analyze shapes of objects represented as landmarks, curves and surfaces across fields of natural sciences and engineering. The shape module first implements widely used shape spaces, such as the Kendall shape space, as well as elastic spaces of discrete curves and surfaces. The shape module further implements the abstract mathematical structures of group actions, fiber bundles, quotient spaces and associated Riemannian metrics which allow users to build their own shape spaces. The Riemannian geometry tools enable users to compare, average, interpolate between shapes inside a given shape space. These essential operations can then be leveraged to perform statistics and machine learning on shape data. We present the object-oriented implementation of the shape module along with illustrative examples and show how it can be used to perform statistics and machine learning on shape spaces. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: MPIM-Bonn-2024

arXiv:2312.05199 [pdf, other]

Low-Temperature Multi-Mode Microwave Spectroscopy of Paramagnetic and Rare-Earth Ion Spin Impurities in Single Crystal Calcium Tungstate

Authors: Elrina Hartman, Michael E Tobar, Ben T McAllister, Jeremy Bourhill, Maxim Goryachev

Abstract: We present experimental observations of dilute ion spin ensembles in an undoped low-loss single crystal cylindrical sample of CaWO$_4$ cooled to 30 mK in temperature. Crystal field perturbations were elucidated by constructing a dielectrically loaded microwave cavity resonator from the crystal. The resonator exhibited numerous whispering gallery modes with high Q-factors of up to $3\times 10^7$, e… ▽ More We present experimental observations of dilute ion spin ensembles in an undoped low-loss single crystal cylindrical sample of CaWO$_4$ cooled to 30 mK in temperature. Crystal field perturbations were elucidated by constructing a dielectrically loaded microwave cavity resonator from the crystal. The resonator exhibited numerous whispering gallery modes with high Q-factors of up to $3\times 10^7$, equivalent to a low loss tangent of $\sim 3\times 10^{-8}$. The low-loss allowed precision multi-mode spectroscopy of numerous high Q-factor photon\hyp{}spin interactions. Measurements between 7 to 22 GHz revealed the presence of Gd$^{3+}$, Fe$^{3+}$, and another trace species, inferred to be rare\hyp{}earth, at concentrations on the order of parts per billion. These findings motivate further exploration of prospective uses of this low-loss dielectric material for applications regarding precision and quantum metrology, as well as tests for beyond standard model physics. △ Less

Submitted 17 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.04382 [pdf, other]

Basis restricted elastic shape analysis on the space of unregistered surfaces

Authors: Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon

Abstract: This paper introduces a new mathematical and numerical framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as repara… ▽ More This paper introduces a new mathematical and numerical framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of the approach we develop is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations observed in a training set. The use of such bases allows to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is here equipped with a non-Euclidean Riemannian metric precisely inherited from the family of aforementioned elastic metrics. We demonstrate how this basis restricted model can be then effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered (i.e. with given point correspondences) or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face scans, and show how it generally outperforms state-of-the-art methods on problems such as shape registration, interpolation, motion transfer or random pose generation. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 18 pages, 10 figures, 8 tables

MSC Class: I.4.0; I.5.1; I.4.9

arXiv:2307.03553 [pdf, other]

VariGrad: A Novel Feature Vector Architecture for Geometric Deep Learning on Unregistered Data

Authors: Emmanuel Hartman, Emery Pierson

Abstract: We present a novel geometric deep learning layer that leverages the varifold gradient (VariGrad) to compute feature vector representations of 3D geometric data. These feature vectors can be used in a variety of downstream learning tasks such as classification, registration, and shape reconstruction. Our model's use of parameterization independent varifold representations of geometric data allows o… ▽ More We present a novel geometric deep learning layer that leverages the varifold gradient (VariGrad) to compute feature vector representations of 3D geometric data. These feature vectors can be used in a variety of downstream learning tasks such as classification, registration, and shape reconstruction. Our model's use of parameterization independent varifold representations of geometric data allows our model to be both trained and tested on data independent of the given sampling or parameterization. We demonstrate the efficiency, generalizability, and robustness to resampling demonstrated by the proposed VariGrad layer. △ Less

Submitted 21 August, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures, 3 tables

MSC Class: I.4.0; I.5.1; I.4.5

arXiv:2303.12506 [pdf, ps, other]

Leximin Approximation: From Single-Objective to Multi-Objective

Authors: Eden Hartman, Avinatan Hassidim, Yonatan Aumann, Erel Segal-Halevi

Abstract: Leximin is a common approach to multi-objective optimization, frequently employed in fair division applications. In leximin optimization, one first aims to maximize the smallest objective value; subject to this, one maximizes the second-smallest objective; and so on. Often, even the single-objective problem of maximizing the smallest value cannot be solved accurately. What can we hope to accomplis… ▽ More Leximin is a common approach to multi-objective optimization, frequently employed in fair division applications. In leximin optimization, one first aims to maximize the smallest objective value; subject to this, one maximizes the second-smallest objective; and so on. Often, even the single-objective problem of maximizing the smallest value cannot be solved accurately. What can we hope to accomplish for leximin optimization in this situation? Recently, Henzinger et al. (2022) defined a notion of \emph{approximate} leximin optimality. Their definition, however, considers only an additive approximation. In this work, we first define the notion of approximate leximin optimality, allowing both multiplicative and additive errors. We then show how to compute, in polynomial time, such an approximate leximin solution, using an oracle that finds an approximation to a single-objective problem. The approximation factors of the algorithms are closely related: an $(α,ε)$-approximation for the single-objective problem (where $α\in (0,1]$ and $ε\geq 0$ are the multiplicative and additive factors respectively) translates into an $\left(\frac{α^2}{1-α+ α^2}, \fracε{1-α+α^2}\right)$-approximation for the multi-objective leximin problem, regardless of the number of objectives. △ Less

Submitted 28 September, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.07116 [pdf, ps, other]

Low Frequency (100-600 MHz) Searches with Axion Cavity Haloscopes

Authors: S. Chakrabarty, J. R. Gleason, Y. Han, A. T. Hipp, M. Solano, P. Sikivie, N. S. Sullivan, D. B. Tanner, M. Goryachev, E. Hartman, B. T. McAllister, A. Quiskamp, C. Thomson, M. E. Tobar, M. H. Awida, A. S. Chou, M. Hollister, S. Knirck, A. Sonnenschein, W. Wester, T. Braine, M. Guzzetti, C. Hanretty, G. Leum, L. J Rosenberg , et al. (22 additional authors not shown)

Abstract: We investigate reentrant and dielectric loaded cavities for the purpose of extending the range of axion cavity haloscopes to lower masses, below the range where the Axion Dark Matter eXperiment (ADMX) has already searched. Reentrant and dielectric loaded cavities were simulated numerically to calculate and optimize their form factors and quality factors. A prototype reentrant cavity was built and… ▽ More We investigate reentrant and dielectric loaded cavities for the purpose of extending the range of axion cavity haloscopes to lower masses, below the range where the Axion Dark Matter eXperiment (ADMX) has already searched. Reentrant and dielectric loaded cavities were simulated numerically to calculate and optimize their form factors and quality factors. A prototype reentrant cavity was built and its measured properties were compared with the simulations. We estimate the sensitivity of axion dark matter searches using reentrant and dielectric loaded cavities inserted in the existing ADMX magnet at the University of Washington and a large magnet being installed at Fermilab. △ Less

Submitted 28 March, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: 33 pages, 24 figures

arXiv:2303.06282 [pdf, other]

doi 10.1103/PhysRevLett.131.101002

Search for a dark-matter induced Cosmic Axion Background with ADMX

Authors: ADMX Collaboration, T. Nitta, T. Braine, N. Du, M. Guzzetti, C. Hanretty, G. Leum, L. J Rosenberg, G. Rybka, J. Sinnis, John Clarke, I. Siddiqi, M. H. Awida, A. S. Chou, M. Hollister, S. Knirck, A. Sonnenschein, W. Wester, J. R. Gleason, A. T. Hipp, P. Sikivie, N. S. Sullivan, D. B. Tanner, R. Khatiwada, G. Carosi , et al. (23 additional authors not shown)

Abstract: We report the first result of a direct search for a Cosmic ${\it axion}$ Background (C$a$B) - a relativistic background of axions that is not dark matter - performed with the axion haloscope, the Axion Dark Matter eXperiment (ADMX). Conventional haloscope analyses search for a signal with a narrow bandwidth, as predicted for dark matter, whereas the C$a$B will be broad. We introduce a novel analys… ▽ More We report the first result of a direct search for a Cosmic ${\it axion}$ Background (C$a$B) - a relativistic background of axions that is not dark matter - performed with the axion haloscope, the Axion Dark Matter eXperiment (ADMX). Conventional haloscope analyses search for a signal with a narrow bandwidth, as predicted for dark matter, whereas the C$a$B will be broad. We introduce a novel analysis strategy, which searches for a C$a$B induced daily modulation in the power measured by the haloscope. Using this, we repurpose data collected to search for dark matter to set a limit on the axion photon coupling of a C$a$B originating from dark matter cascade decay via a mediator in the 800-995 MHz frequency range. We find that the present sensitivity is limited by fluctuations in the cavity readout as the instrument scans across dark matter masses. Nevertheless, we suggest that these challenges can be surmounted using superconducting qubits as single photon counters, and allow ADMX to operate as a telescope searching for axions emerging from the decay of dark matter. The daily modulation analysis technique we introduce can be deployed for various broadband RF signals, such as other forms of a C$a$B or even high-frequency gravitational waves. △ Less

Submitted 3 October, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: 9 pages, 4 figures

Journal ref: Phys. Rev. Lett., 131, 101002 (2023)

arXiv:2301.00284 [pdf, ps, other]

Square Root Normal Fields for Lipschitz surfaces and the Wasserstein Fisher Rao metric

Authors: Emmanuel Hartman, Martin Bauer, Eric Klassen

Abstract: The Square Root Normal Field (SRNF) framework is a method in the area of shape analysis that defines a (pseudo) distance between unparametrized surfaces. For piecewise linear (PL) surfaces it was recently proved that the SRNF distance between unparametrized surfaces is equivalent to the Wasserstein Fisher Rao (WFR) metric on the space of finitely supported measures on $S^2$. In the present article… ▽ More The Square Root Normal Field (SRNF) framework is a method in the area of shape analysis that defines a (pseudo) distance between unparametrized surfaces. For piecewise linear (PL) surfaces it was recently proved that the SRNF distance between unparametrized surfaces is equivalent to the Wasserstein Fisher Rao (WFR) metric on the space of finitely supported measures on $S^2$. In the present article we extend this point of view to a much larger set of surfaces; we show that the SRNF distance on the space of Lipschitz surfaces is equivalent to the WFR distance between Borel measures on $S^2$. For the space of spherical surfaces this result directly allows us to characterize the non-injectivity and the (closure of the) image of the SRNF transform. In the last part of the paper we further generalize this result by showing that the WFR metric for general measure spaces can be interpreted as an optimization problem over the diffeomorphism group of an independent background space. △ Less

Submitted 11 October, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 19 pages

arXiv:2211.13185 [pdf, other]

BaRe-ESA: A Riemannian Framework for Unregistered Human Body Shapes

Authors: Emmanuel Hartman, Emery Pierson, Martin Bauer, Nicolas Charon, Mohamed Daoudi

Abstract: We present Basis Restricted Elastic Shape Analysis (BaRe-ESA), a novel Riemannian framework for human body scan representation, interpolation and extrapolation. BaRe-ESA operates directly on unregistered meshes, i.e., without the need to establish prior point to point correspondences or to assume a consistent mesh structure. Our method relies on a latent space representation, which is equipped wit… ▽ More We present Basis Restricted Elastic Shape Analysis (BaRe-ESA), a novel Riemannian framework for human body scan representation, interpolation and extrapolation. BaRe-ESA operates directly on unregistered meshes, i.e., without the need to establish prior point to point correspondences or to assume a consistent mesh structure. Our method relies on a latent space representation, which is equipped with a Riemannian (non-Euclidean) metric associated to an invariant higher-order metric on the space of surfaces. Experimental results on the FAUST and DFAUST datasets show that BaRe-ESA brings significant improvements with respect to previous solutions in terms of shape registration, interpolation and extrapolation. The efficiency and strength of our model is further demonstrated in applications such as motion transfer and random generation of body shape and pose. △ Less

Submitted 21 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: 13 pages, 7 figures, 3 tables

MSC Class: I.4.0; I.5.1; I.4.9

arXiv:2206.07119 [pdf, other]

Sensitivity Analysis for Survey Weights

Authors: Erin Hartman, Melody Huang

Abstract: Survey weighting allows researchers to account for bias in survey samples, due to unit nonresponse or convenience sampling, using measured demographic covariates. Unfortunately, in practice, it is impossible to know whether the estimated survey weights are sufficient to alleviate concerns about bias due to unobserved confounders or incorrect functional forms used in weighting. In the following pap… ▽ More Survey weighting allows researchers to account for bias in survey samples, due to unit nonresponse or convenience sampling, using measured demographic covariates. Unfortunately, in practice, it is impossible to know whether the estimated survey weights are sufficient to alleviate concerns about bias due to unobserved confounders or incorrect functional forms used in weighting. In the following paper, we propose two sensitivity analyses for the exclusion of important covariates: (1) a sensitivity analysis for partially observed confounders (i.e., variables measured across the survey sample, but not the target population), and (2) a sensitivity analysis for fully unobserved confounders (i.e., variables not measured in either the survey or the target population). We provide graphical and numerical summaries of the potential bias that arises from such confounders, and introduce a benchmarking approach that allows researchers to quantitatively reason about the sensitivity of their results. We demonstrate our proposed sensitivity analyses using state-level 2020 U.S. Presidential Election polls. △ Less

Submitted 6 March, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2204.04238 [pdf, other]

Elastic shape analysis of surfaces with second-order Sobolev metrics: a comprehensive numerical framework

Authors: Emmanuel Hartman, Yashil Sukurdeep, Eric Klassen, Nicolas Charon, Martin Bauer

Abstract: This paper introduces a set of numerical methods for Riemannian shape analysis of 3D surfaces within the setting of invariant (elastic) second-order Sobolev metrics. More specifically, we address the computation of geodesics and geodesic distances between parametrized or unparametrized immersed surfaces represented as 3D meshes. Building on this, we develop tools for the statistical shape analysis… ▽ More This paper introduces a set of numerical methods for Riemannian shape analysis of 3D surfaces within the setting of invariant (elastic) second-order Sobolev metrics. More specifically, we address the computation of geodesics and geodesic distances between parametrized or unparametrized immersed surfaces represented as 3D meshes. Building on this, we develop tools for the statistical shape analysis of sets of surfaces, including methods for estimating Karcher means and performing tangent PCA on shape populations, and for computing parallel transport along paths of surfaces. Our proposed approach fundamentally relies on a relaxed variational formulation for the geodesic matching problem via the use of varifold fidelity terms, which enable us to enforce reparametrization independence when computing geodesics between unparametrized surfaces, while also yielding versatile algorithms that allow us to compare surfaces with varying sampling or mesh structures. Importantly, we demonstrate how our relaxed variational framework can be extended to tackle partially observed data. The different benefits of our numerical pipeline are illustrated over various examples, synthetic and real. △ Less

Submitted 5 December, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 28 pages, 16 figures, 2 tables

MSC Class: 68U05; 49Q10; 58D10

arXiv:2111.01357 [pdf, other]

Leveraging Population Outcomes to Improve the Generalization of Experimental Results

Authors: Melody Huang, Naoki Egami, Erin Hartman, Luke Miratrix

Abstract: Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various weighting estimators for the population average treatment effect (PATE), many of these methods result in large variance because the experimental sample often differs s… ▽ More Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various weighting estimators for the population average treatment effect (PATE), many of these methods result in large variance because the experimental sample often differs substantially from the target population, and estimated sampling weights are extreme. To improve efficiency in practice, we propose post-residualized weighting in which we use the outcome measured in the observational population data to build a flexible predictive model (e.g., machine learning methods) and residualize the outcome in the experimental data before using conventional weighting methods. We show that the proposed PATE estimator is consistent under the same assumptions required for existing weighting methods, importantly without assuming the correct specification of the predictive model. We demonstrate the efficiency gains from this approach through simulations and our application based on a set of job training experiments. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2107.08075 [pdf, other]

Kpop: A kernel balancing approach for reducing specification assumptions in survey weighting

Authors: Erin Hartman, Chad Hazlett, Ciara Sterbenz

Abstract: With the precipitous decline in response rates, researchers and pollsters have been left with highly non-representative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables, $X$ must be adjusted for, they rarely defend particular functional forms relating these v… ▽ More With the precipitous decline in response rates, researchers and pollsters have been left with highly non-representative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables, $X$ must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly-used calibration weights -- which make the weighted mean $X$ in the sample equal that of the population -- only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of $X$ are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix $\mathbf{X}$ with a kernel matrix, $\mathbf{K}$ encoding high-order information about $\mathbf{X}$. Weights are then found to make the weighted average row of $\mathbf{K}$ among sampled units approximately equal that of the target population. This produces good calibration on a wide range of smooth functions of $X$, without relying on the user to decide which $X$ or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 U.S. presidential election. △ Less

Submitted 2 March, 2024; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2105.06510 [pdf, other]

The Square Root Normal Field Distance and Unbalanced Optimal Transport

Authors: Martin Bauer, Emmanuel Hartman, Eric Klassen

Abstract: This paper explores a novel connection between two areas: shape analysis of surfaces and unbalanced optimal transport. Specifically, we characterize the square root normal field (SRNF) shape distance as the pullback of the Wasserstein-Fisher-Rao (WFR) unbalanced optimal transport distance. In addition, we propose a new algorithm for computing the WFR distance and present numerical results that hig… ▽ More This paper explores a novel connection between two areas: shape analysis of surfaces and unbalanced optimal transport. Specifically, we characterize the square root normal field (SRNF) shape distance as the pullback of the Wasserstein-Fisher-Rao (WFR) unbalanced optimal transport distance. In addition, we propose a new algorithm for computing the WFR distance and present numerical results that highlight the effectiveness of this algorithm. As a consequence of our results we obtain a precise method for computing the SRNF shape distance directly on piecewise linear surfaces and gain new insights about the degeneracy of this distance. △ Less

Submitted 20 February, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: 36 pages, 6 figures, 1 table

arXiv:2102.09052 [pdf, other]

Multilevel calibration weighting for survey data

Authors: Eli Ben-Michael, Avi Feller, Erin Hartman

Abstract: In the November 2016 U.S. presidential election, many state level public opinion polls, particularly in the Upper Midwest, incorrectly predicted the winning candidate. One leading explanation for this polling miss is that the precipitous decline in traditional polling response rates led to greater reliance on statistical methods to adjust for the corresponding bias -- and that these methods failed… ▽ More In the November 2016 U.S. presidential election, many state level public opinion polls, particularly in the Upper Midwest, incorrectly predicted the winning candidate. One leading explanation for this polling miss is that the precipitous decline in traditional polling response rates led to greater reliance on statistical methods to adjust for the corresponding bias -- and that these methods failed to adjust for important interactions between key variables like education, race, and geographic region. Finding calibration weights that account for important interactions remains challenging with traditional survey methods: raking typically balances the margins alone, while post-stratification, which exactly balances all interactions, is only feasible for a small number of variables. In this paper, we propose multilevel calibration weighting, which enforces tight balance constraints for marginal balance and looser constraints for higher-order interactions. This incorporates some of the benefits of post-stratification while retaining the guarantees of raking. We then correct for the bias due to the relaxed constraints via a flexible outcome model; we call this approach Double Regression with Post-stratification (DRP). We characterize the asymptotic properties of these estimators and show that the proposed calibration approach has a dual representation as a multilevel model for survey response. We then use these tools to to re-assess a large-scale survey of voter intention in the 2016 U.S. presidential election, finding meaningful gains from the proposed methods. The approach is available in the multical R package. △ Less

Submitted 12 November, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

arXiv:2101.04929 [pdf, other]

Supervised deep learning of elastic SRV distances on the shape space of curves

Authors: Emmanuel Hartman, Yashil Sukurdeep, Nicolas Charon, Eric Klassen, Martin Bauer

Abstract: Motivated by applications from computer vision to bioinformatics, the field of shape analysis deals with problems where one wants to analyze geometric objects, such as curves, while ignoring actions that preserve their shape, such as translations, rotations, or reparametrizations. Mathematical tools have been developed to define notions of distances, averages, and optimal deformations for geometri… ▽ More Motivated by applications from computer vision to bioinformatics, the field of shape analysis deals with problems where one wants to analyze geometric objects, such as curves, while ignoring actions that preserve their shape, such as translations, rotations, or reparametrizations. Mathematical tools have been developed to define notions of distances, averages, and optimal deformations for geometric objects. One such framework, which has proven to be successful in many applications, is based on the square root velocity (SRV) transform, which allows one to define a computable distance between spatial curves regardless of how they are parametrized. This paper introduces a supervised deep learning framework for the direct computation of SRV distances between curves, which usually requires an optimization over the group of reparametrizations that act on the curves. The benefits of our approach in terms of computational speed and accuracy are illustrated via several numerical experiments. △ Less

Submitted 18 April, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: 8 pages, 7 figures, 3 tables. Accepted to DiffCVML

MSC Class: 68T10 ACM Class: I.5.1

arXiv:1909.02669 [pdf, other]

Covariate Selection for Generalizing Experimental Results: Application to Large-Scale Development Program in Uganda

Authors: Naoki Egami, Erin Hartman

Abstract: Generalizing estimates of causal effects from an experiment to a target population is of interest to scientists. However, researchers are usually constrained by available covariate information. Analysts can often collect much fewer variables from population samples than from experimental samples, which has limited applicability of existing approaches that assume rich covariate data from both exper… ▽ More Generalizing estimates of causal effects from an experiment to a target population is of interest to scientists. However, researchers are usually constrained by available covariate information. Analysts can often collect much fewer variables from population samples than from experimental samples, which has limited applicability of existing approaches that assume rich covariate data from both experimental and population samples. In this article, we examine how to select covariates necessary for generalizing experimental results under such data constraints. In our concrete context of a large-scale development program in Uganda, although more than 40 pre-treatment covariates are available in the experiment, only 8 of them were also measured in a target population. We propose a method to estimate a separating set -- a set of variables affecting both the sampling mechanism and treatment effect heterogeneity -- and show that the population average treatment effect (PATE) can be identified by adjusting for estimated separating sets. Our algorithm only requires a rich set of covariates in the experimental data, not in the target population, by incorporating researcher-specific constraints on what variables are measured in the population data. Analyzing the development experiment in Uganda, we show that the proposed algorithm can allow for the PATE estimation in situations where conventional methods fail due to data requirements. △ Less

Submitted 15 January, 2021; v1 submitted 5 September, 2019; originally announced September 2019.

Showing 1–17 of 17 results for author: Hartman, E