Skip to main content

Showing 1–26 of 26 results for author: Joseph, V R

.
  1. arXiv:2401.00800  [pdf, other

    stat.ME stat.ML

    Factor Importance Ranking and Selection using Total Indices

    Authors: Chaofan Huang, V. Roshan Joseph

    Abstract: Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model. Hence, a factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm. Such algorithm-a… ▽ More

    Submitted 11 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  2. arXiv:2312.05372  [pdf, other

    stat.ME stat.ML

    Rational Kriging

    Authors: V. Roshan Joseph

    Abstract: This article proposes a new kriging that has a rational form. It is shown that the generalized least squares estimate of the mean from rational kriging is much more well behaved than that from ordinary kriging. Parameter estimation and uncertainty quantification for rational kriging are proposed using a Gaussian process framework. Its potential applications in emulation and calibration of computer… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  3. arXiv:2310.07953  [pdf, other

    stat.ME stat.CO

    Enhancing Sample Quality through Minimum Energy Importance Weights

    Authors: Chaofan Huang, V. Roshan Joseph

    Abstract: Importance sampling is a powerful tool for correcting the distributional mismatch in many statistical and machine learning problems, but in practice its performance is limited by the usage of simple proposals whose importance weights can be computed analytically. To address this limitation, Liu and Lee (2017) proposed a Black-Box Importance Sampling (BBIS) algorithm that computes the importance we… ▽ More

    Submitted 31 December, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  4. arXiv:2310.07016  [pdf, other

    stat.ME

    Discovering the Unknowns: A First Step

    Authors: V. Roshan Joseph, William E. Lewis, Henry S. Yuchi, Kathryn A. Maupin

    Abstract: This article aims at discovering the unknown variables in the system through data analysis. The main idea is to use the time of data collection as a surrogate variable and try to identify the unknown variables by modeling gradual and sudden changes in the data. We use Gaussian process modeling and a sparse representation of the sudden changes to efficiently estimate the large number of parameters… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  5. arXiv:2309.16492  [pdf, other

    stat.ME cs.AI stat.AP stat.ML

    Asset Bundling for Wind Power Forecasting

    Authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck

    Abstract: The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  6. Sequential Designs for Filling Output Spaces

    Authors: Shangkun Wang, Adam P. Generale, Surya R. Kalidindi, V. Roshan Joseph

    Abstract: Space-filling designs are commonly used in computer experiments to fill the space of inputs so that the input-output relationship can be accurately estimated. However, in certain applications such as inverse design or feature-based modeling, the aim is to fill the response or feature space. In this article, we propose a new experimental design framework that aims to fill the space of the outputs (… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: 36 pages, 12 figures

    Journal ref: Technometrics (2023)

  7. Adaptive Exploration and Optimization of Materials Crystal Structures

    Authors: Arvind Krishna, Huan Tran, Chaofan Huang, Rampi Ramprasad, V. Roshan Joseph

    Abstract: A central problem of materials science is to determine whether a hypothetical material is stable without being synthesized, which is mathematically equivalent to a global optimization problem on a highly non-linear and multi-modal potential energy surface (PES). This optimization problem poses multiple outstanding challenges, including the exceedingly high dimensionality of the PES and that PES mu… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Journal ref: INFORMS Journal on Data Science, 2023

  8. arXiv:2209.13748  [pdf, other

    stat.ME

    Conglomerate Multi-Fidelity Gaussian Process Modeling, with Application to Heavy-Ion Collisions

    Authors: Yi Ji, Henry Shaowu Yuchi, Derek Soeder, J. -F. Paquet, Steffen A. Bass, V. Roshan Joseph, C. F. Jeff Wu, Simon Mak

    Abstract: In an era where scientific experimentation is often costly, multi-fidelity emulation provides a powerful tool for predictive scientific computing. While there has been notable work on multi-fidelity modeling, existing models do not incorporate an important "conglomerate" property of multi-fidelity simulators, where the accuracies of different simulator components are controlled by different fideli… ▽ More

    Submitted 28 September, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  9. arXiv:2202.03326  [pdf, other

    stat.ML cs.LG

    Optimal Ratio for Data Splitting

    Authors: V. Roshan Joseph

    Abstract: It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.

    Submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Statistical Analysis and Data Mining: The ASA Data Science Journal, 2022

  10. arXiv:2110.02927  [pdf, other

    stat.ML cs.LG

    Data Twinning

    Authors: Akhil Vakayil, V. Roshan Joseph

    Abstract: In this work, we develop a method named Twinning, for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model-independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  11. Constrained Minimum Energy Designs

    Authors: Chaofan Huang, V. Roshan Joseph, Douglas M. Ray

    Abstract: Space-filling designs are important in computer experiments, which are critical for building a cheap surrogate model that adequately approximates an expensive computer code. Many design construction techniques in the existing literature are only applicable for rectangular bounded space, but in real world applications, the input space can often be non-rectangular because of constraints on the input… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Submitted to Statistics and Computing

    Journal ref: Stat Comput 31, 80 (2021)

  12. Population Quasi-Monte Carlo

    Authors: Chaofan Huang, V. Roshan Joseph, Simon Mak

    Abstract: Monte Carlo methods are widely used for approximating complicated, multidimensional integrals for Bayesian inference. Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which utilizes a population of proposals to generate weighted samples that approximate the target distribution. The generic PMC framework iterates over three steps: samples are simulated from a set of propos… ▽ More

    Submitted 26 December, 2020; originally announced December 2020.

    Comments: Submitted to Journal of Computational and Graphical Statistics

    Journal ref: Journal of Computational and Graphical Statistics (2022)

  13. SPlit: An Optimal Method for Data Splitting

    Authors: V. Roshan Joseph, Akhil Vakayil

    Abstract: In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal wit… ▽ More

    Submitted 19 March, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

  14. Robust Experimental Designs for Model Calibration

    Authors: Arvind Krishna, V. Roshan Joseph, Shan Ba, William A. Brenneman, William R. Myers

    Abstract: A computer model can be used for predicting an output only after specifying the values of some unknown physical constants known as calibration parameters. The unknown calibration parameters can be estimated from real data by conducting physical experiments. This paper presents an approach to optimally design such a physical experiment. The problem of optimally designing physical experiment, using… ▽ More

    Submitted 2 August, 2020; originally announced August 2020.

    Comments: 25 pages, 10 figures

  15. arXiv:1910.05452  [pdf, other

    stat.ME

    Adaptive design for Gaussian process regression under censoring

    Authors: Jialei Chen, Simon Mak, V. Roshan Joseph, Chuck Zhang

    Abstract: A key objective in engineering problems is to predict an unknown experimental surface over an input domain. In complex physical experiments, this may be hampered by response censoring, which results in a significant loss of information. For such problems, experimental design is paramount for maximizing predictive power using a small number of expensive experimental runs. To tackle this, we propose… ▽ More

    Submitted 25 June, 2021; v1 submitted 11 October, 2019; originally announced October 2019.

    Journal ref: Annals of Applied Statistics, 2021

  16. Function-on-function kriging, with applications to 3D printing of aortic tissues

    Authors: Jialei Chen, Simon Mak, V. Roshan Joseph, Chuck Zhang

    Abstract: 3D-printed medical prototypes, which use synthetic metamaterials to mimic biological tissue, are becoming increasingly important in urgent surgical applications. However, the mimicking of tissue mechanical properties via 3D-printed metamaterial can be difficult and time-consuming, due to the functional nature of both inputs (metamaterial structure) and outputs (mechanical response curve). To deal… ▽ More

    Submitted 1 July, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

    Journal ref: Technometrics,2020

  17. Space-Filling Designs for Robustness Experiments

    Authors: V. Roshan Joseph, Li Gu, Shan Ba, William R. Myers

    Abstract: To identify the robust settings of the control factors, it is very important to understand how they interact with the noise factors. In this article, we propose space-filling designs for computer experiments that are more capable of accurately estimating the control-by-noise interactions. Moreover, the existing space-filling designs focus on uniformly distributing the points in the design space, w… ▽ More

    Submitted 25 December, 2017; originally announced December 2017.

    MSC Class: 62K25

  18. Deterministic Sampling of Expensive Posteriors Using Minimum Energy Designs

    Authors: V. Roshan Joseph, Dianpeng Wang, Li Gu, Shiji Lv, Rui Tuo

    Abstract: Markov chain Monte Carlo (MCMC) methods require a large number of samples to approximate a posterior distribution, which can be costly when the likelihood or prior is expensive to evaluate. The number of samples can be reduced if we can avoid repeated samples and those that are close to each other. This is the idea behind deterministic sampling methods such as Quasi-Monte Carlo (QMC). However, the… ▽ More

    Submitted 24 December, 2017; originally announced December 2017.

    MSC Class: 62K99

  19. arXiv:1708.06897  [pdf, other

    stat.ME

    Projected support points: a new method for high-dimensional data reduction

    Authors: Simon Mak, V. Roshan Joseph

    Abstract: In an era where big and high-dimensional data is readily available, data scientists are inevitably faced with the challenge of reducing this data for expensive downstream computation or analysis. To this end, we present here a new method for reducing high-dimensional big data into a representative point set, called projected support points (PSPs). A key ingredient in our method is the so-called sp… ▽ More

    Submitted 2 June, 2018; v1 submitted 23 August, 2017; originally announced August 2017.

  20. arXiv:1611.07911  [pdf, other

    stat.AP

    An efficient surrogate model for emulation and physics extraction of large eddy simulations

    Authors: Simon Mak, Chih-Li Sung, Xingjian Wang, Shiang-Ting Yeh, Yu-Hung Chang, V. Roshan Joseph, Vigor Yang, C. F. Jeff Wu

    Abstract: In the quest for advanced propulsion and power-generation systems, high-fidelity simulations are too computationally expensive to survey the desired design space, and a new design methodology is needed that combines engineering physics, computer simulations and statistical modeling. In this paper, we propose a new surrogate model that provides efficient prediction and uncertainty quantification of… ▽ More

    Submitted 26 May, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: Submitted to JASA A&CS

  21. arXiv:1611.00203  [pdf, ps, other

    stat.ME

    Orthogonal Gaussian process models

    Authors: Matthew Plumlee, V. Roshan Joseph

    Abstract: Gaussian processes models are widely adopted for nonparameteric/semi-parametric modeling. Identifiability issues occur when the mean model contains polynomials with unknown coefficients. Though resulting prediction is unaffected, this leads to poor estimation of the coefficients in the mean model, and thus the estimated mean model loses interpretability. This paper introduces a new Gaussian proces… ▽ More

    Submitted 1 November, 2016; originally announced November 2016.

  22. arXiv:1609.01811  [pdf, other

    math.ST stat.ME

    Support points

    Authors: Simon Mak, V. Roshan Joseph

    Abstract: This paper introduces a new way to compact a continuous probability distribution $F$ into a set of representative points called support points. These points are obtained by minimizing the energy distance, a statistical potential measure initially proposed by Székely and Rizzo (2004) for testing goodness-of-fit. The energy distance has two appealing features. First, its distance-based structure all… ▽ More

    Submitted 9 September, 2018; v1 submitted 6 September, 2016; originally announced September 2016.

    Comments: Accepted, Annals of Statistics

    MSC Class: 62E17

  23. arXiv:1602.03938  [pdf, other

    stat.CO

    Minimax and minimax projection designs using clustering

    Authors: Simon Mak, V. Roshan Joseph

    Abstract: Minimax designs provide a uniform coverage of a design space $\mathcal{X} \subseteq \mathbb{R}^p$ by minimizing the maximum distance from any point in this space to its nearest design point. Although minimax designs have many useful applications, e.g., for optimal sensor allocation or as space-filling designs for computer experiments, there has been little work in develo** algorithms for generat… ▽ More

    Submitted 28 October, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

    Comments: Under revision, Journal of Computational and Graphical Statistics (JCGS)

  24. Composite Gaussian process models for emulating expensive functions

    Authors: Shan Ba, V. Roshan Joseph

    Abstract: A new type of nonstationary Gaussian process model is developed for approximating computationally expensive functions. The new model is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models local details. The new predictor also incorporates a flexible variance model, which makes it more capable of approximating surfaces with varying v… ▽ More

    Submitted 11 January, 2013; originally announced January 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS570 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS570

    Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 4, 1838-1860

  25. arXiv:1211.1592  [pdf, other

    stat.ME stat.CO

    Analysis of Computer Experiments with Functional Response

    Authors: Ying Hung, V. Roshan Joseph, Shreyes N. Melkote

    Abstract: This paper is motivated by a computer experiment conducted for optimizing residual stresses in the machining of metals. Although kriging is widely used in the analysis of computer experiments, it cannot be easily applied to model the residual stresses because they are obtained as a profile. The high dimensionality caused by this functional response introduces severe computational challenges in kri… ▽ More

    Submitted 7 November, 2012; originally announced November 2012.

  26. Structured variable selection and estimation

    Authors: Ming Yuan, V. Roshan Joseph, Hui Zou

    Abstract: In linear regression problems with related predictors, it is desirable to do variable selection and estimation by maintaining the hierarchical or structural relationships among predictors. In this paper we propose non-negative garrote methods that can naturally incorporate such relationships defined through effect heredity principles or marginality principles. We show that the methods are very eas… ▽ More

    Submitted 2 November, 2010; originally announced November 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS254 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS254

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1738-1757