Skip to main content

Showing 1–36 of 36 results for author: Lee, A B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.05330  [pdf, other

    stat.ML cs.LG

    Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

    Authors: Luca Masserano, Alex Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee

    Abstract: An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 26 pages, 19 figures, code available at https://github.com/lee-group-cmu/lf2i

  2. Structural Forecasting for Short-term Tropical Cyclone Intensity Guidance

    Authors: Trey McNeely, Pavel Khokhlov, Niccolo Dalmasso, Kimberly M. Wood, Ann B. Lee

    Abstract: Because geostationary satellite (Geo) imagery provides a high temporal resolution window into tropical cyclone (TC) behavior, we investigate the viability of its application to short-term probabilistic forecasts of TC convective structure to subsequently predict TC intensity. Here, we present a prototype model which is trained solely on two inputs: Geo infrared imagery leading up to the synoptic t… ▽ More

    Submitted 8 April, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

  3. arXiv:2205.15680  [pdf, other

    stat.ML cs.LG

    Simulator-Based Inference with Waldo: Confidence Regions by Leveraging Prediction Algorithms and Posterior Estimators for Inverse Problems

    Authors: Luca Masserano, Tommaso Dorigo, Rafael Izbicki, Mikael Kuusela, Ann B. Lee

    Abstract: Prediction algorithms, such as deep neural networks (DNNs), are used in many domain sciences to directly estimate internal parameters of interest in simulator-based models, especially in settings where the observations include images or complex high-dimensional data. In parallel, modern neural density estimators, such as normalizing flows, are becoming increasingly popular for uncertainty quantifi… ▽ More

    Submitted 13 November, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 15 pages, 10 figures, code available at https://github.com/lee-group-cmu/lf2i

  4. arXiv:2205.14568  [pdf, other

    stat.ML astro-ph.IM cs.LG stat.ME

    Conditionally Calibrated Predictive Distributions by Probability-Probability Map: Application to Galaxy Redshift Estimation and Probabilistic Forecasting

    Authors: Biprateep Dey, David Zhao, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee

    Abstract: Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional c… ▽ More

    Submitted 17 July, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: 21 pages, 11 figures. Under review. Code available as a Python package https://github.com/lee-group-cmu/Cal-PIT

  5. arXiv:2202.02253  [pdf, other

    stat.AP stat.ME stat.ML

    Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery

    Authors: Trey McNeely, Galen Vincent, Kimberly M. Wood, Rafael Izbicki, Ann B. Lee

    Abstract: Our goal is to quantify whether, and if so how, spatio-temporal patterns in tropical cyclone (TC) satellite imagery signal an upcoming rapid intensity change event. To address this question, we propose a new nonparametric test of association between a time series of images and a series of binary event labels. We ask whether there is a difference in distribution between (dependent but identically d… ▽ More

    Submitted 27 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: 27 pages, 11 figures

  6. arXiv:2110.15209  [pdf, other

    astro-ph.IM cs.LG stat.ME stat.ML

    Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression

    Authors: Biprateep Dey, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee, David Zhao, Markus Michael Rau, Alex I. Malz

    Abstract: Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF)… ▽ More

    Submitted 27 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

  7. arXiv:2109.12029  [pdf, other

    stat.ML cs.LG stat.AP

    Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones

    Authors: Trey McNeely, Galen Vincent, Rafael Izbicki, Kimberly M. Wood, Ann B. Lee

    Abstract: Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited fo… ▽ More

    Submitted 30 November, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 7 pages, 4 figures, Tackling Climate Change with Machine Learning: workshop at NeurIPS 2021

  8. arXiv:2107.03920  [pdf, other

    stat.ML cs.LG

    Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference

    Authors: Niccolò Dalmasso, Luca Masserano, David Zhao, Rafael Izbicki, Ann B. Lee

    Abstract: Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or mor… ▽ More

    Submitted 19 November, 2023; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: 45 pages, 6 figures, code available at https://github.com/lee-group-cmu/lf2i, supplementary material available at https://lucamasserano.github.io/data/LF2I_supplementary_material.pdf

  9. arXiv:2102.10473  [pdf, other

    stat.ME

    Diagnostics for Conditional Density Models and Bayesian Inference Algorithms

    Authors: David Zhao, Niccolò Dalmasso, Rafael Izbicki, Ann B. Lee

    Abstract: There has been growing interest in the AI community for precise uncertainty quantification. Conditional density models f(y|x), where x represents potentially high-dimensional features, are an integral part of uncertainty quantification in prediction and Bayesian inference. However, it is challenging to assess conditional density estimates and gain insight into modes of failure. While existing diag… ▽ More

    Submitted 23 July, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: Appearing in 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021), Spotlight Talk; camera-ready version

  10. arXiv:2012.15130  [pdf, other

    stat.AP

    Spatio-temporal methods for estimating subsurface ocean thermal response to tropical cyclones

    Authors: Addison J. Hu, Mikael Kuusela, Ann B. Lee, Donata Giglio, Kimberly M. Wood

    Abstract: Tropical cyclones (TCs), driven by heat exchange between the air and sea, pose a substantial risk to many communities around the world. Accurate characterization of the subsurface ocean thermal response to TC passage is crucial for accurate TC intensity forecasts and for an understanding of the role that TCs play in the global climate system. However, that characterization is complicated by the hi… ▽ More

    Submitted 14 March, 2024; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: 39 pages, 14 figures; supplement and code at https://github.com/huisaddison/tc-ocean-methods

  11. arXiv:2010.05783  [pdf, other

    cs.LG stat.AP

    Structural Forecasting for Tropical Cyclone Intensity Prediction: Providing Insight with Deep Learning

    Authors: Trey McNeely, Niccolò Dalmasso, Kimberly M. Wood, Ann B. Lee

    Abstract: Tropical cyclone (TC) intensity forecasts are ultimately issued by human forecasters. The human in-the-loop pipeline requires that any forecasting guidance must be easily digestible by TC experts if it is to be adopted at operational centers like the National Hurricane Center. Our proposed framework leverages deep learning to provide forecasters with something neither end-to-end prediction models… ▽ More

    Submitted 7 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: To appear in the Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Proposals Track) 3 pages, 1 figure

  12. arXiv:2010.04651  [pdf, ps, other

    stat.AP stat.ML

    Wildfire Smoke and Air Quality: How Machine Learning Can Guide Forest Management

    Authors: Lorenzo Tomaselli, Coty Jen, Ann B. Lee

    Abstract: Prescribed burns are currently the most effective method of reducing the risk of widespread wildfires, but a largely missing component in forest management is knowing which fuels one can safely burn to minimize exposure to toxic smoke. Here we show how machine learning, such as spectral clustering and manifold learning, can provide interpretable representations and powerful tools for differentiati… ▽ More

    Submitted 7 December, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Spotlight talk at the Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Proposals Track), 5 pages, 2 figures

  13. arXiv:2010.04051  [pdf, other

    stat.AP stat.ML

    HECT: High-Dimensional Ensemble Consistency Testing for Climate Models

    Authors: Niccolò Dalmasso, Galen Vincent, Dorit Hammerling, Ann B. Lee

    Abstract: Climate models play a crucial role in understanding the effect of environmental and man-made changes on climate to help mitigate climate risks and inform governmental decisions. Large global climate models such as the Community Earth System Model (CESM), developed by the National Center for Atmospheric Research, are very complex with millions of lines of code describing interactions of the atmosph… ▽ More

    Submitted 30 November, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted at the Tackling Climate Change with Machine Learning workshop at NeurIPS 2020, 6 pages, 1 figure

  14. arXiv:2002.10399  [pdf, other

    stat.ME cs.LG stat.ML

    Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting

    Authors: Niccolò Dalmasso, Rafael Izbicki, Ann B. Lee

    Abstract: Parameter estimation, statistical tests and confidence sets are the cornerstones of classical statistics that allow scientists to make inferences about the underlying process that generated the observed data. A key question is whether one can still construct hypothesis tests and confidence sets with proper coverage and high power in a so-called likelihood-free inference (LFI) setting; that is, a s… ▽ More

    Submitted 13 August, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 20 pages, 8 figures, 6 tables, 4 algorithm boxes

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2323-2334, 2020

  15. Unlocking GOES: A Statistical Framework for Quantifying the Evolution of Convective Structure in Tropical Cyclones

    Authors: Trey McNeely, Ann B. Lee, Kimberly M. Wood, Dorit Hammerling

    Abstract: Tropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes which drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is criti… ▽ More

    Submitted 3 August, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: 19 pages, 14 figures, Submitted to the Journal of Applied Meteorology and Climatology

    Journal ref: Journal of Applied Meteorology and Climatology 59.10 (2020): 1671-1689

  16. arXiv:1908.11523  [pdf, other

    astro-ph.IM stat.CO stat.ML

    Conditional Density Estimation Tools in Python and R with Applications to Photometric Redshifts and Likelihood-Free Cosmological Inference

    Authors: Niccolò Dalmasso, Taylor Pospisil, Ann B. Lee, Rafael Izbicki, Peter E. Freeman, Alex I. Malz

    Abstract: It is well known in astronomy that propagating non-Gaussian prediction uncertainty in photometric redshift estimates is key to reducing bias in downstream cosmological analyses. Similarly, likelihood-free inference approaches, which are beginning to emerge as a tool for cosmological analysis, require a characterization of the full uncertainty landscape of the parameters of interest given observed… ▽ More

    Submitted 20 December, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: 27 pages, 7 figures, 4 tables

  17. arXiv:1906.07177  [pdf, other

    stat.CO stat.ME

    (f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data

    Authors: Taylor Pospisil, Ann B. Lee

    Abstract: Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not efficiently handle functional data and runs into a curse-of dimensionality when presented with high-resolution curves and surfaces. Furthermore, in settings with h… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1804.05753

  18. arXiv:1905.11505  [pdf, other

    stat.ME stat.ML

    Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

    Authors: Niccolò Dalmasso, Ann B. Lee, Rafael Izbicki, Taylor Pospisil, Ilmun Kim, Chieh-An Lin

    Abstract: Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an approximate likelihood or fit a fast emulator model for efficient statistical inference; such surrogate models include Gaussian synthetic likelihoods and more re… ▽ More

    Submitted 2 December, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: 22 pages, 9 Figures, 2 Tables

    Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108, 3349-3361, 2020

  19. arXiv:1812.08927  [pdf, other

    stat.ME

    Global and Local Two-Sample Tests via Regression

    Authors: Ilmun Kim, Ann B. Lee, **g Lei

    Abstract: Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing mu… ▽ More

    Submitted 18 November, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

  20. ABC-CDE: Towards Approximate Bayesian Computation with Complex High-Dimensional Data and Limited Simulations

    Authors: Rafael Izbicki, Ann B. Lee, Taylor Pospisil

    Abstract: Approximate Bayesian Computation (ABC) is typically used when the likelihood is either unavailable or intractable but where data can be simulated under different parameter settings using a forward model. Despite the recent interest in ABC, high-dimensional data and costly simulations still remain a bottleneck in some applications. There is also no consensus as to how to best assess the performance… ▽ More

    Submitted 20 October, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

    Journal ref: Journal of Computational and Graphical Statistics, 2019 (https://www.tandfonline.com/doi/abs/10.1080/10618600.2018.1546594)

  21. arXiv:1804.05753  [pdf, other

    stat.ML cs.LG

    RFCDE: Random Forests for Conditional Density Estimation

    Authors: Taylor Pospisil, Ann B. Lee

    Abstract: Random forests is a common non-parametric regression technique which performs well for mixed-type data and irrelevant covariates, while being robust to monotonic variable transformations. Existing random forest implementations target regression or classification. We introduce the RFCDE package for fitting random forest models optimized for nonparametric conditional density estimation, including jo… ▽ More

    Submitted 2 May, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

    Comments: Fix URL in Arxiv abstract

  22. arXiv:1704.08095  [pdf, other

    stat.ME stat.ML

    Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation

    Authors: Rafael Izbicki, Ann B. Lee

    Abstract: There is a growing demand for nonparametric conditional density estimators (CDEs) in fields such as astronomy and economics. In astronomy, for example, one can dramatically improve estimates of the parameters that dictate the evolution of the Universe by working with full conditional densities instead of regression (i.e., conditional mean) estimates. More generally, standard regression falls short… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

  23. arXiv:1604.01339  [pdf, other

    stat.AP astro-ph.IM

    Photo-z Estimation: An Example of Nonparametric Conditional Density Estimation under Selection Bias

    Authors: Rafael Izbicki, Ann B. Lee, Peter E. Freeman

    Abstract: Redshift is a key quantity for inferring cosmological model parameters. In photometric redshift estimation, cosmologists use the coarse data collected from the vast majority of galaxies to predict the redshift of individual galaxies. To properly quantify the uncertainty in the predictions, however, one needs to go beyond standard regression and instead estimate the full conditional density f(z|x)… ▽ More

    Submitted 5 April, 2016; originally announced April 2016.

  24. Nonparametric Conditional Density Estimation in a High-Dimensional Regression Setting

    Authors: Rafael Izbicki, Ann B. Lee

    Abstract: In some applications (e.g., in cosmology and economics), the regression E[Z|x] is not adequate to represent the association between a predictor x and a response Z because of multi-modality and asymmetry of f(z|x); using the full density instead of a single-point estimate can then lead to less bias in subsequent analysis. As of now, there are no effective ways of estimating f(z|x) when x represents… ▽ More

    Submitted 2 April, 2016; originally announced April 2016.

  25. arXiv:1602.00355  [pdf, other

    stat.ME stat.ML

    A Spectral Series Approach to High-Dimensional Nonparametric Regression

    Authors: Ann B. Lee, Rafael Izbicki

    Abstract: A key question in modern statistics is how to make fast and reliable inferences for complex, high-dimensional data. While there has been much interest in sparse techniques, current methods do not generalize well to data with nonlinear structure. In this work, we present an orthogonal series estimator for predictors that are complex aggregate objects, such as natural images, galaxy spectra, traject… ▽ More

    Submitted 31 January, 2016; originally announced February 2016.

    Journal ref: Electron. J. Statist. Volume 10, Number 1 (2016), 423-463

  26. arXiv:1404.7063  [pdf, other

    stat.ME

    High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation

    Authors: Rafael Izbicki, Ann B. Lee, Chad M. Schafer

    Abstract: The ratio between two probability density functions is an important component of various tasks, including selection bias correction, novelty detection and classification. Recently, several estimators of this ratio have been proposed. Most of these methods fail if the sample space is high-dimensional, and hence require a dimension reduction step, the result of which can be a significant loss of inf… ▽ More

    Submitted 29 April, 2014; v1 submitted 28 April, 2014; originally announced April 2014.

    Comments: With supplementary material

    MSC Class: 62G; 62M15

    Journal ref: JMLR W&CP 33 :420-429, 2014

  27. Refining genetically inferred relationships using treelet covariance smoothing

    Authors: Andrew Crossett, Ann B. Lee, Lambertus Klei, Bernie Devlin, Kathryn Roeder

    Abstract: Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predisposition to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be identified from family records, and more distant relatives can be inferred from large panels of geneti… ▽ More

    Submitted 10 December, 2013; v1 submitted 10 August, 2012; originally announced August 2012.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS598 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS598

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 2, 669-690

  28. arXiv:1111.0911  [pdf, other

    stat.AP astro-ph.IM

    Exploiting Non-Linear Structure in Astronomical Data for Improved Statistical Inference

    Authors: Ann B. Lee, Peter E. Freeman

    Abstract: Many estimation problems in astrophysics are highly complex, with high-dimensional, non-standard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are non-linear data transformation methods… ▽ More

    Submitted 3 November, 2011; originally announced November 2011.

    Comments: Invited talk at SCMA V, Penn State University, June 2011, PA. To appear in the Proceedings of "Statistical Challenges in Modern Astronomy V"

  29. arXiv:1106.0545  [pdf, other

    stat.AP

    Assessment of Aortic Aneurysm Rupture Risk

    Authors: Rafael Izbicki, Ann B. Lee, Ender A. Finol

    Abstract: The rupture of an abdominal aortic aneurysm (AAA) is associated with a high mortality. When an AAA ruptures, 50% of the patients die before reaching the hospital. Of the patients that are able to reach the operating room, only 50% have it successfully repaired (Fillinger et al, 2003). Therefore, it is important to find good predictors for immediate risk of rupture. Clinically, the size of the aneu… ▽ More

    Submitted 19 August, 2015; v1 submitted 2 June, 2011; originally announced June 2011.

  30. arXiv:1105.6344  [pdf, ps, other

    stat.AP astro-ph.IM

    Prototype selection for parameter estimation in complex models

    Authors: Joseph W. Richards, Ann B. Lee, Chad M. Schafer, Peter E. Freeman

    Abstract: Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distor… ▽ More

    Submitted 20 March, 2012; v1 submitted 31 May, 2011; originally announced May 2011.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOAS500 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS500

    Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 1, 383-408

  31. arXiv:0908.2409  [pdf, other

    stat.AP

    A Spectral Graph Approach to Discovering Genetic Ancestry

    Authors: Ann B. Lee, Diana Luca, Kathryn Roeder

    Abstract: Map** human genetic variation is fundamentally interesting in fields such as anthropology and forensic inference. At the same time patterns of genetic diversity confound efforts to determine the genetic basis of complex disease. Due to technological advances it is now possible to measure hundreds of thousands of genetic variants per individual across the genome. Principal component analysis (P… ▽ More

    Submitted 17 August, 2009; originally announced August 2009.

    Comments: 6 figures

    Journal ref: Annals of Applied Statistics, 4(1), 179-201, 2010

  32. arXiv:0907.0199  [pdf, ps, other

    stat.AP

    High-Dimensional Density Estimation via SCA: An Example in the Modelling of Hurricane Tracks

    Authors: Susan M. Buchman, Ann B. Lee, Chad M. Schafer

    Abstract: We present nonparametric techniques for constructing and verifying density estimates from high-dimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A low-dimensional representation of the data is critical in such situations because of the curse of dimensionality. Our proposed methodology consists of three main parts: (1) data reparame… ▽ More

    Submitted 1 July, 2009; originally announced July 2009.

    Comments: 13 pages, 5 figures

  33. arXiv:0811.0121  [pdf, other

    stat.ME

    Spectral Connectivity Analysis

    Authors: Ann B. Lee, Larry Wasserman

    Abstract: Spectral kernel methods are techniques for transforming data into a coordinate system that efficiently reveals the geometric structure - in particular, the "connectivity" - of the data. These methods depend on certain tuning parameters. We analyze the dependence of the method on these tuning parameters. We focus on one particular technique - diffusion maps - but our analysis can be used for othe… ▽ More

    Submitted 1 November, 2008; originally announced November 2008.

  34. Rejoinder of: Treelets--An adaptive multi-scale basis for spare unordered data

    Authors: Ann B. Lee, Boaz Nadler, Larry Wasserman

    Abstract: Rejoinder of "Treelets--An adaptive multi-scale basis for spare unordered data" [arXiv:0707.0481]

    Submitted 25 July, 2008; originally announced July 2008.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS137REJ the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS137REJ

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 2, 494-500

  35. Exploiting Low-Dimensional Structure in Astronomical Spectra

    Authors: Joseph W. Richards, Peter E. Freeman, Ann B. Lee, Chad M. Schafer

    Abstract: Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work we apply a recently-developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework prov… ▽ More

    Submitted 18 July, 2008; originally announced July 2008.

    Comments: 24 pages, 8 figures

    Journal ref: Astrophys.J.691:32-42,2009

  36. Treelets--An adaptive multi-scale basis for sparse unordered data

    Authors: Ann B. Lee, Boaz Nadler, Larry Wasserman

    Abstract: In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this pape… ▽ More

    Submitted 25 July, 2008; v1 submitted 3 July, 2007; originally announced July 2007.

    Comments: This paper commented in: [arXiv:0807.4011], [arXiv:0807.4016], [arXiv:0807.4018], [arXiv:0807.4019], [arXiv:0807.4023], [arXiv:0807.4024]. Rejoinder in [arXiv:0807.4028]. Published in at http://dx.doi.org/10.1214/07-AOAS137 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS137

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 2, 435-471