Skip to main content

Showing 1–4 of 4 results for author: Joseph, V R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.16492  [pdf, other

    stat.ME cs.AI stat.AP stat.ML

    Asset Bundling for Wind Power Forecasting

    Authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck

    Abstract: The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  2. arXiv:2202.03326  [pdf, other

    stat.ML cs.LG

    Optimal Ratio for Data Splitting

    Authors: V. Roshan Joseph

    Abstract: It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.

    Submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Statistical Analysis and Data Mining: The ASA Data Science Journal, 2022

  3. arXiv:2110.02927  [pdf, other

    stat.ML cs.LG

    Data Twinning

    Authors: Akhil Vakayil, V. Roshan Joseph

    Abstract: In this work, we develop a method named Twinning, for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model-independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  4. SPlit: An Optimal Method for Data Splitting

    Authors: V. Roshan Joseph, Akhil Vakayil

    Abstract: In this article we propose an optimal method referred to as SPlit for splitting a dataset into training and testing sets. SPlit is based on the method of Support Points (SP), which was initially developed for finding the optimal representative points of a continuous distribution. We adapt SP for subsampling from a dataset using a sequential nearest neighbor algorithm. We also extend SP to deal wit… ▽ More

    Submitted 19 March, 2021; v1 submitted 20 December, 2020; originally announced December 2020.