-
High-energy Neutrino Source Cross-correlations with Nearest Neighbor Distributions
Authors:
Zhuoyang Zhou,
Jessi Cisewski-Kehe,
Ke Fang,
Arka Banerjee
Abstract:
The astrophysical origins of the majority of the IceCube neutrinos remain unknown. Effectively characterizing the spatial distribution of the neutrino samples and associating the events with astrophysical source catalogs can be challenging given the large atmospheric neutrino background and underlying non-Gaussian spatial features in the neutrino and source samples. In this paper, we investigate a…
▽ More
The astrophysical origins of the majority of the IceCube neutrinos remain unknown. Effectively characterizing the spatial distribution of the neutrino samples and associating the events with astrophysical source catalogs can be challenging given the large atmospheric neutrino background and underlying non-Gaussian spatial features in the neutrino and source samples. In this paper, we investigate a framework for identifying and statistically evaluating the cross-correlations between IceCube data and an astrophysical source catalog based on the $k$-Nearest Neighbor Cumulative Distribution Functions ($k$NN-CDFs). We propose a maximum likelihood estimation procedure for inferring the true proportions of astrophysical neutrinos in the point-source data. We conduct a statistical power analysis of an associated likelihood ratio test with estimations of its sensitivity and discovery potential with synthetic neutrino data samples and a WISE-2MASS galaxy sample. We apply the method to IceCube's public ten-year point-source data and find no statistically significant evidence for spatial cross-correlations with the selected galaxy sample. We discuss possible extensions to the current method and explore the method's potential to identify the cross-correlation signals in data sets with different sample sizes.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Confidence regions for a persistence diagram of a single image with one or more loops
Authors:
Susan Glenn,
Jessi Cisewski-Kehe,
Jun Zhu,
William M. Bement
Abstract:
Topological data analysis (TDA) uses persistent homology to quantify loops and higher-dimensional holes in data, making it particularly relevant for examining the characteristics of images of cells in the field of cell biology. In the context of a cell injury, as time progresses, a wound in the form of a ring emerges in the cell image and then gradually vanishes. Performing statistical inference o…
▽ More
Topological data analysis (TDA) uses persistent homology to quantify loops and higher-dimensional holes in data, making it particularly relevant for examining the characteristics of images of cells in the field of cell biology. In the context of a cell injury, as time progresses, a wound in the form of a ring emerges in the cell image and then gradually vanishes. Performing statistical inference on this ring-like pattern in a single image is challenging due to the absence of repeated samples. In this paper, we develop a novel framework leveraging TDA to estimate underlying structures within individual images and quantify associated uncertainties through confidence regions. Our proposed method partitions the image into the background and the damaged cell regions. Then pixels within the affected cell region are used to establish confidence regions in the space of persistence diagrams (topological summary statistics). The method establishes estimates on the persistence diagrams which correct the bias of traditional TDA approaches. A simulation study is conducted to evaluate the coverage probabilities of the proposed confidence regions in comparison to an alternative approach is proposed in this paper. We also illustrate our methodology by a real-world example provided by cell repair.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
The Weighted Euler Characteristic Transform for Image Shape Classification
Authors:
Jessi Cisewski-Kehe,
Brittany Terese Fasy,
Dhanush Giriyan,
Eli Quist
Abstract:
The weighted Euler characteristic transform (WECT) is a new tool for extracting shape information from data equipped with a weight function. Image data may benefit from the WECT where the intensity of the pixels are used to define the weight function. In this work, an empirical assessment of the WECT's ability to distinguish shapes on images with different pixel intensity distributions is consider…
▽ More
The weighted Euler characteristic transform (WECT) is a new tool for extracting shape information from data equipped with a weight function. Image data may benefit from the WECT where the intensity of the pixels are used to define the weight function. In this work, an empirical assessment of the WECT's ability to distinguish shapes on images with different pixel intensity distributions is considered, along with visualization techniques to improve the intuition and understanding of what is captured by the WECT. Additionally, the expected weighted Euler characteristic and the expected WECT are derived.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Practical Guidance for Bayesian Inference in Astronomy
Authors:
Gwendolyn M. Eadie,
Joshua S. Speagle,
Jessi Cisewski-Kehe,
Daniel Foreman-Mackey,
Daniela Huppenkothen,
David E. Jones,
Aaron Springford,
Hyungsuk Tak
Abstract:
In the last two decades, Bayesian inference has become commonplace in astronomy. At the same time, the choice of algorithms, terminology, notation, and interpretation of Bayesian inference varies from one sub-field of astronomy to the next, which can lead to confusion to both those learning and those familiar with Bayesian statistics. Moreover, the choice varies between the astronomy and statistic…
▽ More
In the last two decades, Bayesian inference has become commonplace in astronomy. At the same time, the choice of algorithms, terminology, notation, and interpretation of Bayesian inference varies from one sub-field of astronomy to the next, which can lead to confusion to both those learning and those familiar with Bayesian statistics. Moreover, the choice varies between the astronomy and statistics literature, too. In this paper, our goal is two-fold: (1) provide a reference that consolidates and clarifies terminology and notation across disciplines, and (2) outline practical guidance for Bayesian inference in astronomy. Highlighting both the astronomy and statistics literature, we cover topics such as notation, specification of the likelihood and prior distributions, inference using the posterior distribution, and posterior predictive checking. It is not our intention to introduce the entire field of Bayesian data analysis -- rather, we present a series of useful practices for astronomers who already have an understanding of the Bayesian "nuts and bolts" and wish to increase their expertise and extend their knowledge. Moreover, as the field of astrostatistics and astroinformatics continues to grow, we hope this paper will serve as both a helpful reference and as a jum** off point for deeper dives into the statistics and astrostatistics literature.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Accounting for stellar activity signals in radial-velocity data by using Change Point Detection techniques
Authors:
U. Simola,
A. Bonfanti,
X. Dumusque,
J. Cisewski-Kehe,
S. Kaski,
J. Corander
Abstract:
Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variat…
▽ More
Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variation in RV data (overall correction method). However, since active regions evolve on the photosphere over time, correlations between the RV observations and the activity indicators will correspondingly be anisotropic. We present an approach that recognizes the RV locations where the correlations between the RV and the activity indicators significantly change in order to better account for variations in RV caused by stellar activity. The proposed approach uses a general family of statistical breakpoint methods, often referred to as change point detection (CPD) algorithms; several implementations of which are available in R and python. A thorough comparison is made between the breakpoint-based approach and the overall correction method. To ensure wide representativity, we use measurements from real stars that have different levels of stellar activity and whose spectra have different signal-to-noise ratios. When the corrections for stellar activity are applied separately to each temporal segment identified by the breakpoint method, the corresponding residuals in the RV time series are typically much smaller than those obtained by the overall correction method. Consequently, the generalized Lomb-Scargle periodogram contains a smaller number of peaks caused by active regions. The CPD algorithm is particularly effective when focusing on active stars with long time series, such as alpha Cen B.
△ Less
Submitted 31 May, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Differentiating small-scale subhalo distributions in CDM and WDM models using persistent homology
Authors:
Jessi Cisewski-Kehe,
Brittany Terese Fasy,
Wojciech Hellwing,
Mark R. Lovell,
Pawel Drozda,
Mike Wu
Abstract:
The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog…
▽ More
The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog environment regions of the cold dark matter (CDM) and warm dark matter (WDM) Copernicus Complexio $N$-body cosmological simulations. The results of the hypothesis tests find statistically significant differences (p-values $\leq$ 0.001) between the CDM and WDM structures, and the functional summaries of persistence diagrams detect differences at scales that are distinct from the comparison spatial point process functional summaries considered (including the two-point correlation function). The differences between the models are driven most strongly at filtration scales $\sim100$~kpc, where CDM generates larger numbers of unconnected halo clusters while WDM instead generates loops. This study was conducted on dark matter haloes generally; future work will involve applying the same methods to realistic galaxy catalogues.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
The EXPRES Stellar Signals Project II. State of the Field in Disentangling Photospheric Velocities
Authors:
Lily L. Zhao,
Debra A. Fischer,
Eric B. Ford,
Alex Wise,
Michaël Cretignier,
Suzanne Aigrain,
Oscar Barragan,
Megan Bedell,
Lars A. Buchhave,
João D. Camacho,
Heather M. Cegla,
Jessi Cisewski-Kehe,
Andrew Collier Cameron,
Zoe L. de Beurs,
Sally Dodson-Robinson,
Xavier Dumusque,
João P. Faria,
Christian Gilbertson,
Charlotte Haley,
Justin Harrell,
David W. Hogg,
Parker Holzer,
Ancy Anna John,
Baptiste Klein,
Marina Lafarga
, et al. (18 additional authors not shown)
Abstract:
Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme precision radial velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The EXPRES Stellar Signals Project (ESSP) presents a self-consist…
▽ More
Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme precision radial velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The EXPRES Stellar Signals Project (ESSP) presents a self-consistent comparison of 22 different methods tested on the same extreme-precision spectroscopic data from EXPRES. Methods derived new activity indicators, constructed models for map** an indicator to the needed RV correction, or separated out shape- and shift-driven RV components. Since no ground truth is known when using real data, relative method performance is assessed using the total and nightly scatter of returned RVs and agreement between the results of different methods. Nearly all submitted methods return a lower RV RMS than classic linear decorrelation, but no method is yet consistently reducing the RV RMS to sub-meter-per-second levels. There is a concerning lack of agreement between the RVs returned by different methods. These results suggest that continued progress in this field necessitates increased interpretability of methods, high-cadence data to capture stellar signals at all timescales, and continued tests like the ESSP using consistent data sets with more advanced metrics for method performance. Future comparisons should make use of various well-characterized data sets -- such as solar data or data with known injected planetary and/or stellar signals -- to better understand method performance and whether planetary signals are preserved.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Sidestep** the inversion of the weak-lensing covariance matrix with Approximate Bayesian Computation
Authors:
Martin Kilbinger,
Emille E. O. Ishida,
Jessi Cisewski-Kehe
Abstract:
Weak gravitational lensing is one of the few direct methods to map the dark-matter distribution on large scales in the Universe, and to estimate cosmological parameters. We study a Bayesian inference problem where the data covariance $\mathbf{C}$, estimated from a number $n_{\textrm{s}}$ of numerical simulations, is singular. In a cosmological context of large-scale structure observations, the cre…
▽ More
Weak gravitational lensing is one of the few direct methods to map the dark-matter distribution on large scales in the Universe, and to estimate cosmological parameters. We study a Bayesian inference problem where the data covariance $\mathbf{C}$, estimated from a number $n_{\textrm{s}}$ of numerical simulations, is singular. In a cosmological context of large-scale structure observations, the creation of a large number of such $N$-body simulations is often prohibitively expensive. Inference based on a likelihood function often includes a precision matrix, $Ψ= \mathbf{C}^{-1}$. The covariance matrix corresponding to a $p$-dimensional data vector is singular for $p \ge n_{\textrm{s}}$, in which case the precision matrix is unavailable. We propose the likelihood-free inference method Approximate Bayesian Computation (ABC) as a solution that circumvents the inversion of the singular covariance matrix. We present examples of increasing degree of complexity, culminating in a realistic cosmological scenario of the determination of the weak-gravitational lensing power spectrum for the upcoming European Space Agency satellite Euclid. While we found the ABC parameter estimate variances to be mildly larger compared to likelihood-based approaches, which are restricted to settings with $p < n_{\textrm{s}}$, we obtain unbiased parameter estimates with ABC even in extreme cases where $p / n_{\textrm{s}} \gg 1$. The code has been made publicly available to ensure the reproducibility of the results.
△ Less
Submitted 28 February, 2023; v1 submitted 6 December, 2021;
originally announced December 2021.
-
A Stellar Activity F-statistic for Exoplanet Surveys (SAFE)
Authors:
Parker H. Holzer,
Jessi Cisewski-Kehe,
Lily Zhao,
Eric B. Ford,
Christian Gilbertson,
Debra A. Fischer
Abstract:
In the search for planets orbiting distant stars the presence of stellar activity in the atmospheres of observed stars can obscure the radial velocity signal used to detect such planets. Furthermore, this stellar activity contamination is set by the star itself and cannot simply be avoided with better instrumentation. Various stellar activity indicators have been developed that may correlate with…
▽ More
In the search for planets orbiting distant stars the presence of stellar activity in the atmospheres of observed stars can obscure the radial velocity signal used to detect such planets. Furthermore, this stellar activity contamination is set by the star itself and cannot simply be avoided with better instrumentation. Various stellar activity indicators have been developed that may correlate with this contamination. We introduce a new stellar activity indicator called the Stellar Activity F-statistic for Exoplanet surveys (SAFE) that has higher statistical power (i.e., probability of detecting a true stellar activity signal) than many traditional stellar activity indicators in a simulation study of an active region on a Sun-like star with moderate to high signal-to-noise. Also through simulation, the SAFE is demonstrated to be associated with the projected area on the visible side of the star covered by active regions. We also demonstrate that the SAFE detects statistically significant stellar activity in most of the spectra for HD 22049, a star known to have high stellar variability. Additionally, the SAFE is calculated for recent observations of the three low-variability stars HD 34411, HD 10700, and HD 3651, the latter of which is known to have a planetary companion. As expected, the SAFE for these three only occasionally detects activity. Furthermore, initial exploration appears to indicate that the SAFE may be useful for disentangling stellar activity signals from planet-induced Doppler shifts.
△ Less
Submitted 10 April, 2021;
originally announced April 2021.
-
A Hermite-Gaussian Based Radial Velocity Estimation Method
Authors:
Parker Holzer,
Jessi Cisewski-Kehe,
Debra Fischer,
Lily Zhao
Abstract:
As the first successful technique used to detect exoplanets orbiting distant stars, the Radial Velocity Method aims to detect a periodic Doppler shift in a star's spectrum. We introduce a new, mathematically rigorous, approach to detect such a signal that accounts for functional relationships of neighboring wavelengths, minimizes the role of wavelength interpolation, accounts for heteroskedastic n…
▽ More
As the first successful technique used to detect exoplanets orbiting distant stars, the Radial Velocity Method aims to detect a periodic Doppler shift in a star's spectrum. We introduce a new, mathematically rigorous, approach to detect such a signal that accounts for functional relationships of neighboring wavelengths, minimizes the role of wavelength interpolation, accounts for heteroskedastic noise, and easily allows for statistical inference. Using Hermite-Gaussian functions, we show that the problem of detecting a Doppler shift in the spectrum can be reduced to linear regression in many settings. A simulation study demonstrates that the proposed method is able to accurately estimate an individual spectrum's radial velocity with precision below 0.3 m/s. Furthermore, the new method outperforms the traditional Cross-Correlation Function approach by reducing the root mean squared error up to 15 cm/s. The proposed method is also demonstrated on a new set of observations from the EXtreme PREcision Spectrometer (EXPRES) for the star 51 Pegasi, and successfully recovers estimates that agree well with previous studies of this planetary system. Data and Python3 code associated with this work can be found at https://github.com/parkerholzer/hgrv_method. The method is also implemented in the open source R package rvmethod.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Trend Filtering -- II. Denoising Astronomical Signals with Varying Degrees of Smoothness
Authors:
Collin A. Politsch,
Jessi Cisewski-Kehe,
Rupert A. C. Croft,
Larry Wasserman
Abstract:
Trend filtering---first introduced into the astronomical literature in Paper I of this series---is a state-of-the-art statistical tool for denoising one-dimensional signals that possess varying degrees of smoothness. In this work, we demonstrate the broad utility of trend filtering to observational astronomy by discussing how it can contribute to a variety of spectroscopic and time-domain studies.…
▽ More
Trend filtering---first introduced into the astronomical literature in Paper I of this series---is a state-of-the-art statistical tool for denoising one-dimensional signals that possess varying degrees of smoothness. In this work, we demonstrate the broad utility of trend filtering to observational astronomy by discussing how it can contribute to a variety of spectroscopic and time-domain studies. The observations we discuss are (1) the Lyman-$α$ forest of quasar spectra; (2) more general spectroscopy of quasars, galaxies, and stars; (3) stellar light curves with planetary transits; (4) eclipsing binary light curves; and (5) supernova light curves. We study the Lyman-$α$ forest in the greatest detail---using trend filtering to map the large-scale structure of the intergalactic medium along quasar-observer lines of sight. The remaining studies share broad themes of: (1) estimating observable parameters of light curves and spectra; and (2) constructing observational spectral/light-curve templates. We also briefly discuss the utility of trend filtering as a tool for one-dimensional data reduction and compression.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Realizing the potential of astrostatistics and astroinformatics
Authors:
Gwendolyn Eadie,
Thomas J. Loredo,
Ashish A. Mahabal,
Aneta Siemiginowska,
Eric Feigelson,
Eric B. Ford,
S. G. Djorgovski,
Matthew Graham,
Zeljko Ivezic,
Kirk Borne,
Jessi Cisewski-Kehe,
J. E. G. Peek,
Chad Schafer,
Padma A. Yanamandra-Fisher,
C. Alex Young
Abstract:
This Astro2020 State of the Profession Consideration White Paper highlights the growth of astrostatistics and astroinformatics in astronomy, identifies key issues hampering the maturation of these new subfields, and makes recommendations for structural improvements at different levels that, if acted upon, will make significant positive impacts across astronomy.
This Astro2020 State of the Profession Consideration White Paper highlights the growth of astrostatistics and astroinformatics in astronomy, identifies key issues hampering the maturation of these new subfields, and makes recommendations for structural improvements at different levels that, if acted upon, will make significant positive impacts across astronomy.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Trend Filtering -- I. A Modern Statistical Tool for Time-Domain Astronomy and Astronomical Spectroscopy
Authors:
Collin A. Politsch,
Jessi Cisewski-Kehe,
Rupert A. C. Croft,
Larry Wasserman
Abstract:
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. For example, in the time domain, an astronomical object may exhibit a smoothly varying intensity that is occasionally interrupted by abrupt dips or spikes. Likewise, in the spectroscopic setting, a noiseless spectrum typically contains int…
▽ More
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. For example, in the time domain, an astronomical object may exhibit a smoothly varying intensity that is occasionally interrupted by abrupt dips or spikes. Likewise, in the spectroscopic setting, a noiseless spectrum typically contains intervals of relative smoothness mixed with localized higher frequency components such as emission peaks and absorption lines. In this work, we present trend filtering, a modern nonparametric statistical tool that yields significant improvements in this broad problem space of denoising $spatially$ $heterogeneous$ signals. When the underlying signal is spatially heterogeneous, trend filtering is superior to any statistical estimator that is a linear combination of the observed data---including kernel smoothers, LOESS, smoothing splines, Gaussian process regression, and many other popular methods. Furthermore, the trend filtering estimate can be computed with practical and scalable efficiency via a specialized convex optimization algorithm, e.g. handling sample sizes of $n\gtrsim10^7$ within a few minutes. In a companion paper, we explicitly demonstrate the broad utility of trend filtering to observational astronomy by carrying out a diverse set of spectroscopic and time-domain analyses.
△ Less
Submitted 10 January, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Adaptive Approximate Bayesian Computation Tolerance Selection
Authors:
Umberto Simola,
Jessica Cisewski-Kehe,
Michael U. Gutmann,
Jukka Corander
Abstract:
Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al.…
▽ More
Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al. (2009) has become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance, which specifies how close the simulated data need to be to the real data for acceptance. We propose a method for adaptively selecting a sequence of tolerances that improves the computational efficiency of the algorithm over other common techniques. In addition we define a stop** rule as a by-product of the adaptation procedure, which assists in automating termination of sampling. The proposed automatic ABC-PMC algorithm can be easily implemented and we present several examples demonstrating its benefits in terms of computational efficiency.
△ Less
Submitted 30 April, 2020; v1 submitted 21 June, 2019;
originally announced July 2019.
-
A Preferential Attachment Model for the Stellar Initial Mass Function
Authors:
Jessi Cisewski-Kehe,
Grant Weller,
Chad Schafer
Abstract:
Accurate specification of a likelihood function is becoming increasingly difficult in many inference problems in astronomy. As sample sizes resulting from astronomical surveys continue to grow, deficiencies in the likelihood function lead to larger biases in key parameter estimates. These deficiencies result from the oversimplification of the physical processes that generated the data, and from th…
▽ More
Accurate specification of a likelihood function is becoming increasingly difficult in many inference problems in astronomy. As sample sizes resulting from astronomical surveys continue to grow, deficiencies in the likelihood function lead to larger biases in key parameter estimates. These deficiencies result from the oversimplification of the physical processes that generated the data, and from the failure to account for observational limitations. Unfortunately, realistic models often do not yield an analytical form for the likelihood. The estimation of a stellar initial mass function (IMF) is an important example. The stellar IMF is the mass distribution of stars initially formed in a given cluster of stars, a population which is not directly observable due to stellar evolution and other disruptions and observational limitations of the cluster. There are several difficulties with specifying a likelihood in this setting since the physical processes and observational challenges result in measurable masses that cannot legitimately be considered independent draws from an IMF. This work improves inference of the IMF by using an approximate Bayesian computation approach that both accounts for observational and astrophysical effects and incorporates a physically-motivated model for star cluster formation. The methodology is illustrated via a simulation study, demonstrating that the proposed approach can recover the true posterior in realistic situations, and applied to observations from astrophysical simulation data.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Modeling the Echelle Spectra Continuum with Alpha Shapes and Local Regression Fitting
Authors:
Xin Xu,
Jessi Cisewski-Kehe,
Allen B. Davis,
Debra A. Fischer,
John M. Brewer
Abstract:
Continuum normalization of echelle spectra is an important data analysis step that is difficult to automate. Polynomial fitting requires a reasonably high order model to follow the steep slope of the blaze function. However, in the presence of deep spectral lines, a high order polynomial fit can result in ripples in the normalized continuum that increase errors in spectral analysis. Here, we prese…
▽ More
Continuum normalization of echelle spectra is an important data analysis step that is difficult to automate. Polynomial fitting requires a reasonably high order model to follow the steep slope of the blaze function. However, in the presence of deep spectral lines, a high order polynomial fit can result in ripples in the normalized continuum that increase errors in spectral analysis. Here, we present two algorithms for flattening the spectrum continuum. The Alpha-shape Fitting to Spectrum algorithm (AFS) is completely data-driven, using an alpha shape to obtain an initial estimate of the blaze function. The Alpha-shape and Lab Source Fitting to Spectrum algorithm (ALSFS) incorporates a continuum constraint from a lab source reference spectrum for the blaze function estimation. These algorithms are tested on a simulated spectrum, where we demonstrate improved normalization compared to polynomial regression for continuum fitting. We show an additional application, using the algorithms for mitigation of spatially correlated quantum efficiency variations and fringing in the CCD detector of the EXtreme PREcision Spectrometer (EXPRES).
△ Less
Submitted 22 April, 2019;
originally announced April 2019.
-
Astro2020 Science White Paper: The Next Decade of Astroinformatics and Astrostatistics
Authors:
A. Siemiginowska,
G. Eadie,
I. Czekala,
E. Feigelson,
E. B. Ford,
V. Kashyap,
M. Kuhn,
T. Loredo,
M. Ntampaka,
A. Stevens,
A. Avelino,
K. Borne,
T. Budavari,
B. Burkhart,
J. Cisewski-Kehe,
F. Civano,
I. Chilingarian,
D. A. van Dyk,
G. Fabbiano,
D. P. Finkbeiner,
D. Foreman-Mackey,
P. Freeman,
A. Fruscione,
A. A. Goodman,
M. Graham
, et al. (27 additional authors not shown)
Abstract:
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New met…
▽ More
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) develo** sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
The Role of Machine Learning in the Next Decade of Cosmology
Authors:
Michelle Ntampaka,
Camille Avestruz,
Steven Boada,
Joao Caldeira,
Jessi Cisewski-Kehe,
Rosanne Di Stefano,
Cora Dvorkin,
August E. Evrard,
Arya Farahi,
Doug Finkbeiner,
Shy Genel,
Alyssa Goodman,
Andy Goulding,
Shirley Ho,
Arthur Kosowsky,
Paul La Plante,
Francois Lanusse,
Michelle Lochner,
Rachel Mandelbaum,
Daisuke Nagai,
Jeffrey A. Newman,
Brian Nord,
J. E. G. Peek,
Austin Peel,
Barnabas Poczos
, et al. (5 additional authors not shown)
Abstract:
In recent years, machine learning (ML) methods have remarkably improved how cosmologists can interpret data. The next decade will bring new opportunities for data-driven cosmological discovery, but will also present new challenges for adopting ML methodologies and understanding the results. ML could transform our field, but this transformation will require the astronomy community to both foster an…
▽ More
In recent years, machine learning (ML) methods have remarkably improved how cosmologists can interpret data. The next decade will bring new opportunities for data-driven cosmological discovery, but will also present new challenges for adopting ML methodologies and understanding the results. ML could transform our field, but this transformation will require the astronomy community to both foster and promote interdisciplinary research endeavors.
△ Less
Submitted 14 January, 2021; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Measuring precise radial velocities and cross-correlation function line-profile variations using a Skew Normal density
Authors:
Umberto Simola,
Xavier Dumusque,
Jessi Cisewski-Kehe
Abstract:
Stellar activity is one of the primary limitations to the detection of low-mass exoplanets using the radial-velocity (RV) technique. We propose to estimate the variations in shape of the CCF by fitting a Skew Normal (SN) density which, unlike the commonly employed Normal density, includes a skewness parameter to capture the asymmetry of the CCF induced by stellar activity and the convective bluesh…
▽ More
Stellar activity is one of the primary limitations to the detection of low-mass exoplanets using the radial-velocity (RV) technique. We propose to estimate the variations in shape of the CCF by fitting a Skew Normal (SN) density which, unlike the commonly employed Normal density, includes a skewness parameter to capture the asymmetry of the CCF induced by stellar activity and the convective blueshift. The performances of the proposed method are compared to the commonly employed Normal density using both simulations and real observations, with different levels of activity and signal-to-noise ratio. When considering real observations, the correlation between the RV and the asymmetry of the CCF and between the RV and the width of the CCF are stronger when using the parameters estimated with the SN density rather than the ones obtained with the commonly employed Normal density. Using the proposed SN approach, the uncertainties estimated on the RV defined as the median of the SN are on average 10% smaller than the uncertainties calculated on the mean of the Normal. The uncertainties estimated on the asymmetry parameter of the SN are on average 15% smaller than the uncertainties measured on the Bisector Inverse Slope Span (BIS SPAN), which is the commonly used parameter to evaluate the asymmetry of the CCF. We also propose a new model to account for stellar activity when fitting a planetary signal to RV data. Based on simple simulations, we were able to demonstrate that this new model improves the planetary detection limits by 12% compared to the model commonly used to account for stellar activity. The SN density is a better model than the Normal density for characterizing the CCF since the correlations used to probe stellar activity are stronger and the uncertainties of the RV estimate and the asymmetry of the CCF are both smaller.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Finding cosmic voids and filament loops using topological data analysis
Authors:
Xin Xu,
Jessi Cisewski-Kehe,
Sheridan B. Green,
Daisuke Nagai
Abstract:
(abridged) We present the Significant Cosmic Holes in Universe (SCHU) method for identifying cosmic voids and loops of filaments in cosmological datasets and assigning their statistical significance using techniques from topological data analysis. Persistent homology is used to find different dimensional holes. For dark matter halo catalogs and galaxy surveys, the 0-, 1-, and 2-dimensional holes c…
▽ More
(abridged) We present the Significant Cosmic Holes in Universe (SCHU) method for identifying cosmic voids and loops of filaments in cosmological datasets and assigning their statistical significance using techniques from topological data analysis. Persistent homology is used to find different dimensional holes. For dark matter halo catalogs and galaxy surveys, the 0-, 1-, and 2-dimensional holes can be identified with clusters, loops of filaments, and voids. The procedure overlays halos/galaxies on a 3D grid, and a distance-to-measure (DTM) function is calculated at each point of the grid. A filtration is generated over the lower-level sets of the DTM across increasing threshold values. The filtered simplicial complex can be used to summarize the birth/death times of the different dimension homology group generators (i.e., the holes). Persistence diagrams are produced from the dimension and birth/death times of each homology group generator. Using the persistence diagrams and bootstrap sampling, we explain how $p$-values can be assigned to each homology group generator. The homology group generators on a persistence diagram are not, in general, uniquely located back in the original dataset volume so we propose a method for finding a representation of the homology group generators. This method provides a novel, statistically rigorous approach for locating informative generators in cosmological datasets, which may be useful for providing complementary cosmological constraints on the effects of, for example, the sum of the neutrino masses. The method is tested on a Voronoi foam simulation, and then applied to a subset of the SDSS galaxy survey and a cosmological simulation. Lastly, we calculate Betti functions for two of the MassiveNuS simulations and discuss implications for using the persistent homology of the density field to help break degeneracy in the cosmological parameters.
△ Less
Submitted 16 March, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.
-
Incorporating Uncertainties in Atomic Data Into the Analysis of Solar and Stellar Observations: A Case Study in Fe XIII
Authors:
Xixi Yu,
Giulio Del Zanna,
David C. Stenning,
Jessi Cisewski-Kehe,
Vinay L. Kashyap,
Nathan Stein,
David A. van Dyk,
Harry P. Warren,
Mark A. Weber
Abstract:
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures…
▽ More
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures yield information on the temperature structure. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters. At present, however, atomic physics databases do not include uncertainties on the atomic parameters and there is no established methodology for using them even if they did. In this paper we develop simple models for the uncertainties in the collision strengths and decay rates for Fe XIII and apply them to the interpretation of density sensitive lines observed with the EUV Imagining spectrometer (EIS) on Hinode. We incorporate these uncertainties in a Bayesian framework. We consider both a pragmatic Bayesian method where the atomic physics information is unaffected by the observed data, and a fully Bayesian method where the data can be used to probe the physics. The former generally increases the uncertainty in the inferred density by about a factor of 5 compared with models that incorporate only statistical uncertainties. The latter reduces the uncertainties on the inferred densities, but identifies areas of possible systematic problems with either the atomic physics or the observed intensities.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
Statistical challenges in the search for dark matter
Authors:
Sara Algeri,
Melissa van Beekveld,
Nassim Bozorgnia,
Alyson Brooks,
J. Alberto Casas,
Jessi Cisewski-Kehe,
Francis-Yan Cyr-Racine,
Thomas D. P. Edwards,
Fabio Iocco,
Bradley J. Kavanagh,
Judita Mamužić,
Siddharth Mishra-Sharma,
Wolfgang Rau,
Roberto Ruiz de Austri,
Benjamin R. Safdi,
Pat Scott,
Tracy R. Slatyer,
Yue-Lin Sming Tsai,
Aaron C. Vincent,
Christoph Weniger,
Jennifer Rittenhouse West,
Robert L. Wolpert
Abstract:
The search for the particle nature of dark matter has given rise to a number of experimental, theoretical and statistical challenges. Here, we report on a number of these statistical challenges and new techniques to address them, as discussed in the DMStat workshop held Feb 26 - Mar 3 2018 at the Banff International Research Station for Mathematical Innovation and Discovery (BIRS) in Banff, Albert…
▽ More
The search for the particle nature of dark matter has given rise to a number of experimental, theoretical and statistical challenges. Here, we report on a number of these statistical challenges and new techniques to address them, as discussed in the DMStat workshop held Feb 26 - Mar 3 2018 at the Banff International Research Station for Mathematical Innovation and Discovery (BIRS) in Banff, Alberta.
△ Less
Submitted 24 July, 2018;
originally announced July 2018.
-
Functional Summaries of Persistence Diagrams
Authors:
Eric Berry,
Yen-Chi Chen,
Jessi Cisewski-Kehe,
Brittany Terese Fasy
Abstract:
One of the primary areas of interest in applied algebraic topology is persistent homology, and, more specifically, the persistence diagram. Persistence diagrams have also become objects of interest in topological data analysis. However, persistence diagrams do not naturally lend themselves to statistical goals, such as inferring certain population characteristics, because their complicated structu…
▽ More
One of the primary areas of interest in applied algebraic topology is persistent homology, and, more specifically, the persistence diagram. Persistence diagrams have also become objects of interest in topological data analysis. However, persistence diagrams do not naturally lend themselves to statistical goals, such as inferring certain population characteristics, because their complicated structure makes common algebraic operations--such as addition, division, and multiplication-- challenging (e.g., the mean might not be unique). To bypass these issues, several functional summaries of persistence diagrams have been proposed in the literature (e.g. landscape and silhouette functions). The problem of analyzing a set of persistence diagrams then becomes the problem of analyzing a set of functions, which is a topic that has been studied for decades in statistics. First, we review the various functional summaries in the literature and propose a unified framework for the functional summaries. Then, we generalize the definition of persistence landscape functions, establish several theoretical properties of the persistence functional summaries, and demonstrate and discuss their performance in the context of classification using simulated prostate cancer histology data, and two-sample hypothesis tests comparing human and monkey fibrin images, after develo** a simulation study using a new data generator we call the Pickup Sticks Simulator (STIX).
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
Approximate Bayesian Computation for Finite Mixture Models
Authors:
Umberto Simola,
Jessi Cisewski-Kehe,
Robert L. Wolpert
Abstract:
Finite mixture models are used in statistics and other disciplines, but inference for mixture models is challenging due, in part, to the multimodality of the likelihood function and the so-called label switching problem. We propose extensions of the Approximate Bayesian Computation-Population Monte Carlo (ABC-PMC) algorithm as an alternative framework for inference on finite mixture models. There…
▽ More
Finite mixture models are used in statistics and other disciplines, but inference for mixture models is challenging due, in part, to the multimodality of the likelihood function and the so-called label switching problem. We propose extensions of the Approximate Bayesian Computation-Population Monte Carlo (ABC-PMC) algorithm as an alternative framework for inference on finite mixture models. There are several decisions to make when implementing an ABC-PMC algorithm for finite mixture models, including the selection of the kernels used for moving the particles through the iterations, how to address the label switching problem, and the choice of informative summary statistics. Examples are presented to demonstrate the performance of the proposed ABC-PMC algorithm for mixture modeling. The performance of the proposed method is evaluated in a simulation study and for the popular recessional velocity galaxy data.
△ Less
Submitted 2 November, 2020; v1 submitted 27 March, 2018;
originally announced March 2018.