-
GALExtin: An alternative online tool to determine the interstellar extinction in the Milky Way
Authors:
Eduardo B. Amores,
Ricardo M. Jesus,
Andre Moitinho,
Vladan Arsenijevic,
Ronaldo S. Levenhagen,
Douglas J. Marshall,
Leandro O. Kerber,
Roseli Kunzel,
Rodrigo A. Moura
Abstract:
Estimates of interstellar extinction are essential in a broad range of astronomical research. In the last decades, several maps and models of the large scale interstellar extinction in the Galaxy have been published. However, these maps and models have been developed in different programming languages, with different user interfaces and input/output formats, which makes using and comparing results…
▽ More
Estimates of interstellar extinction are essential in a broad range of astronomical research. In the last decades, several maps and models of the large scale interstellar extinction in the Galaxy have been published. However, these maps and models have been developed in different programming languages, with different user interfaces and input/output formats, which makes using and comparing results from these maps and models difficult. To address this issue, we have developed a tool called GALExtin (\url{http://www.galextin.org}) - that estimates interstellar extinction based on both 3D models/maps and 2D maps available. The user only needs to provide a list with coordinates (and distance) and to choose a model/map. GALExtin will then provide an output list with extinction estimates. It can be implemented in any other portal or model that requires interstellar extinction estimates. Here, a general overview of GALExtin is presented, along with its capabilities, validation, performance and some results.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
Linear Aggregation in Tree-based Estimators
Authors:
Sören R. Künzel,
Theo F. Saarinen,
Edward W. Liu,
Jasjeet S. Sekhon
Abstract:
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggrega…
▽ More
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithm's favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
△ Less
Submitted 9 September, 2021; v1 submitted 15 June, 2019;
originally announced June 2019.
-
Causaltoolbox---Estimator Stability for Heterogeneous Treatment Effects
Authors:
Sören R. Künzel,
Simon J. S. Walter,
Jasjeet S. Sekhon
Abstract:
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling appr…
▽ More
Estimating heterogeneous treatment effects has become increasingly important in many fields and life and death decisions are now based on these estimates: for example, selecting a personalized course of medical treatment. Recently, a variety of procedures relying on different assumptions have been suggested for estimating heterogeneous treatment effects. Unfortunately, there are no compelling approaches that allow identification of the procedure that has assumptions that hew closest to the process generating the data set under study and researchers often select one arbitrarily. This approach risks making inferences that rely on incorrect assumptions and gives the experimenter too much scope for $p$-hacking. A single estimator will also tend to overlook patterns other estimators could have picked up. We believe that the conclusion of many published papers might change had a different estimator been chosen and we suggest that practitioners should evaluate many estimators and assess their similarity when investigating heterogeneous treatment effects. We demonstrate this by applying 28 different estimation procedures to an emulated observational data set; this analysis shows that different estimation procedures may give starkly different estimates. We also provide an extensible \texttt{R} package which makes it straightforward for practitioners to follow our recommendations.
△ Less
Submitted 28 March, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
Transfer Learning for Estimating Causal Effects using Neural Networks
Authors:
Sören R. Künzel,
Bradly C. Stadie,
Nikita Vemuri,
Varsha Ramakrishnan,
Jasjeet S. Sekhon,
Pieter Abbeel
Abstract:
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the exta…
▽ More
We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature. By taking advantage of transfer learning, we are able to efficiently use different data sources that are related to the same underlying causal mechanisms. We compare our algorithms with those in the extant literature using extensive simulation studies based on large-scale voter persuasion experiments and the MNIST database. Our methods can perform an order of magnitude better than existing benchmarks while using a fraction of the data.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
Authors:
Sören R. Künzel,
Jasjeet S. Sekhon,
Peter J. Bickel,
Bin Yu
Abstract:
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Ran…
▽ More
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Meta-algorithms build on base algorithms---such as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks---to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new X-learner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
△ Less
Submitted 23 April, 2019; v1 submitted 12 June, 2017;
originally announced June 2017.
-
A Comprehensive Statistical Description of Radio-Through-$γ$-Ray Spectral Energy Distributions of All Known Blazars
Authors:
Peiyuan Mao,
C. Megan Urry,
Francesco Massaro,
Alessandro Paggi,
Joe Cauteruccio,
Soren R. Künzel
Abstract:
We combined multi-wavelength data for blazars from the Roma-BZCAT catalog and analyzed hundreds of X-ray spectra. We present the fluxes and Spectral Energy Distributions (SEDs), in 12 frequency bands from radio to $γ$-rays, for a final sample of 2214 blazars. Using a model-independent statistical approach, we looked for systematic trends in the SEDs; the most significant trends involved the radio…
▽ More
We combined multi-wavelength data for blazars from the Roma-BZCAT catalog and analyzed hundreds of X-ray spectra. We present the fluxes and Spectral Energy Distributions (SEDs), in 12 frequency bands from radio to $γ$-rays, for a final sample of 2214 blazars. Using a model-independent statistical approach, we looked for systematic trends in the SEDs; the most significant trends involved the radio luminosities and X-ray spectral indices of the blazars. We used a Principal Component Analysis (PCA), to determine the basis vectors of the blazar SEDs and, in order to maximize the size of the sample, imputed missing fluxes using the K-nearest neighbors method. Using more than an order of magnitude more data than was available when Fossati et al. (1997, 1998) first reported trends of SED shape with blazar luminosity, we confirmed the anti-correlation between radio luminosity and synchrotron peak frequency, although with greater scatter than was seen in the smaller sample. The same trend can be seen between bolometric luminosity and synchrotron peak frequency. Finally, we used all available blazar data to determine an empirical SED description that depends only on the radio luminosity at 1.4~GHz and the redshift. We verified that this statistically significant relation was not a result of the luminosity-luminosity correlations that are natural in flux-limited samples (i.e., where the correlation is actually caused by the redshift rather than the luminosity).
△ Less
Submitted 13 April, 2016;
originally announced April 2016.
-
Remarks on Kneip's linear smoothers
Authors:
Sören R. Künzel,
David Pollard,
Dana Yang
Abstract:
We were trying to understand the analysis provided by Kneip (1994, Ordered Linear Smoothers). In particular we wanted to persuade ourselves that his results imply the oracle inequality stated by Tsybakov (2014, Lecture 8). This note contains our reworking of Kneip's ideas.
We were trying to understand the analysis provided by Kneip (1994, Ordered Linear Smoothers). In particular we wanted to persuade ourselves that his results imply the oracle inequality stated by Tsybakov (2014, Lecture 8). This note contains our reworking of Kneip's ideas.
△ Less
Submitted 7 May, 2014;
originally announced May 2014.
-
Kinematics and chemical abundances of the B star HD 28248
Authors:
Ronaldo S. Levenhagen,
Roseli Künzel,
Nelson V. Leister
Abstract:
We perform a detailed elemental abundance study of the early-type B star HD 28248 and estimate its orbital path in the Galaxy. From the comparison of spectroscopic observations performed at the European Southern Observatory at La Silla in 2001/Oct/07 with non-LTE synthetic spectra using a new wrapper for the simultaneous fitting of several lines of a given atomic species, the abundances of He, C,…
▽ More
We perform a detailed elemental abundance study of the early-type B star HD 28248 and estimate its orbital path in the Galaxy. From the comparison of spectroscopic observations performed at the European Southern Observatory at La Silla in 2001/Oct/07 with non-LTE synthetic spectra using a new wrapper for the simultaneous fitting of several lines of a given atomic species, the abundances of He, C, N, O, Mg, Al, Si, P, S, Ar and Fe were determined for the first time. The radial velocity of HD 28248 has been also estimated from the positions of centroids of nine neutral helium lines and Mg ii 4481 A, allowing to calculate its right-handed Galactic space-velocity components U, V and W and estimate its orbital path in the Galaxy for the first time. Our chemical analysis depicted an outstanding enrichment of several atomic species, particularly [Fe/H] = +0.25 dex and [O/Fe] = +0.32 dex. The kinematic parameters show that its orbit is confined to the galactic disk with a scale height of 400 pc and the star has moved about 4 kpc from its birthplace to the current position. The elemental abundances do not follow the predicted [Fe/H] and [O/Fe] gradients currently established for the Galaxy. A hypothetical scenario for the contamination could be the mass transfer in a binary system during previous evolutionary phases.
△ Less
Submitted 24 October, 2012; v1 submitted 18 October, 2012;
originally announced October 2012.