Skip to main content

Showing 1–7 of 7 results for author: Chi, J T

.
  1. arXiv:2211.05749  [pdf, other

    stat.CO stat.ML

    Sketched Gaussian Model Linear Discriminant Analysis via the Randomized Kaczmarz Method

    Authors: Jocelyn T. Chi, Deanna Needell

    Abstract: We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access t… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  2. arXiv:2209.04968  [pdf, other

    stat.CO

    Population-Based Hierarchical Non-negative Matrix Factorization for Survey Data

    Authors: Xiaofu Ding, Xinyu Dong, Olivia McGough, Chenxin Shen, Annie Ulichney, Ruiyao Xu, William Swartworth, Jocelyn T. Chi, Deanna Needell

    Abstract: Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

  3. arXiv:2105.03228  [pdf, other

    stat.CO stat.ME

    SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data

    Authors: Jocelyn T. Chi, Ilse C. F. Ipsen, Tzu-Hung Hsiao, Ching-Heng Lin, Li-San Wang, Wan-** Lee, Tzu-Pin Lu, Jung-Ying Tzeng

    Abstract: The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in GxE assessment, especially for set-based GxE variance component (VC) tests, which are… ▽ More

    Submitted 14 May, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  4. arXiv:2010.04133  [pdf, other

    stat.CO

    A User-Friendly Computational Framework for Robust Structured Regression with the L$_2$ Criterion

    Authors: Jocelyn T. Chi, Eric C. Chi

    Abstract: We introduce a user-friendly computational framework for implementing robust versions of a wide variety of structured regression methods with the L$_{2}$ criterion. In addition to introducing an algorithm for performing L$_{2}$E regression, our framework enables robust regression with the L$_{2}$ criterion for additional structural constraints, works without requiring complex tuning procedures on… ▽ More

    Submitted 13 September, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  5. arXiv:2007.06099  [pdf, ps, other

    math.NA

    Multiplicative Perturbation Bounds for Multivariate Multiple Linear Regression in Schatten $p$-Norms

    Authors: Jocelyn T. Chi, Ilse C. F. Ipsen

    Abstract: Multivariate multiple linear regression (MMLR), which occurs in a number of practical applications, generalizes traditional least squares (multivariate linear regression) to multiple right-hand sides. We extend recent MLR analyses to sketched MMLR in general Schatten $p$-norms by interpreting the sketched problem as a multiplicative perturbation. Our work represents an extension of Maher's results… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

  6. arXiv:1808.05924  [pdf, other

    stat.ML cs.LG math.NA

    A Projector-Based Approach to Quantifying Total and Excess Uncertainties for Sketched Linear Regression

    Authors: Jocelyn T. Chi, Ilse C. F. Ipsen

    Abstract: Linear regression is a classic method of data analysis. In recent years, sketching -- a method of dimension reduction using random sampling, random projections, or both -- has gained popularity as an effective computational approximation when the number of observations greatly exceeds the number of variables. In this paper, we address the following question: How does sketching affect the statistic… ▽ More

    Submitted 3 August, 2020; v1 submitted 17 August, 2018; originally announced August 2018.

  7. $k$-POD: A Method for $k$-Means Clustering of Missing Data

    Authors: Jocelyn T. Chi, Eric C. Chi, Richard G. Baraniuk

    Abstract: The $k$-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our $k$-POD method presents a simp… ▽ More

    Submitted 27 January, 2016; v1 submitted 25 November, 2014; originally announced November 2014.

    Comments: 26 pages, 7 tables

    Journal ref: The American Statistician 70(1):91-99, 2016