Skip to main content

Showing 1–13 of 13 results for author: Park, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.02715  [pdf, other

    stat.ME math.ST

    Grou** predictors via network-wide metrics

    Authors: Brandon Woosuk Park, Anand N. Vidyashankar, Tucker S. McElroy

    Abstract: When multitudes of features can plausibly be associated with a response, both privacy considerations and model parsimony suggest grou** them to increase the predictive power of a regression model. Specifically, the identification of groups of predictors significantly associated with the response variable eases further downstream analysis and decision-making. This paper proposes a new data analys… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    MSC Class: 62G05 62J05 62L12 60F05

  2. arXiv:2307.04034  [pdf, other

    stat.ME

    Robust Universal Inference

    Authors: Beomjo Park, Sivaraman Balakrishnan, Larry Wasserman

    Abstract: In statistical inference, it is rarely realistic that the hypothesized statistical model is well-specified, and consequently it is important to understand the effects of misspecification on inferential procedures. When the hypothesized statistical model is misspecified, the natural target of inference is a projection of the data generating distribution onto the model. We present a general method f… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: 37 pages, 11 figures

  3. arXiv:2204.11858  [pdf, other

    cs.LG cs.AI stat.ML

    Data Uncertainty without Prediction Models

    Authors: Bongjoon Park, Eunkyung Koh

    Abstract: Data acquisition processes for machine learning are often costly. To construct a high-performance prediction model with fewer data, a degree of difficulty in prediction is often deployed as the acquisition function in adding a new data point. The degree of difficulty is referred to as uncertainty in prediction models. We propose an uncertainty estimation method named a Distance-weighted Class Impu… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  4. arXiv:2105.09707  [pdf, other

    stat.AP physics.ao-ph

    Spatio-temporal Local Interpolation of Global Ocean Heat Transport using Argo Floats: A Debiased Latent Gaussian Process Approach

    Authors: Beomjo Park, Mikael Kuusela, Donata Giglio, Alison Gray

    Abstract: The world ocean plays a key role in redistributing heat in the climate system and hence in regulating Earth's climate. Yet statistical analysis of ocean heat transport suffers from partially incomplete large-scale data intertwined with complex spatio-temporal dynamics, as well as from potential model misspecification. We present a comprehensive spatio-temporal statistical framework tailored to int… ▽ More

    Submitted 18 July, 2022; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: 30 pages, 10 figures with supplementary material 9 pages, 10 figures

  5. arXiv:2009.08789  [pdf, other

    stat.ME math.ST

    Additive Models for Symmetric Positive-Definite Matrices, Riemannian Manifolds and Lie groups

    Authors: Zhenhua Lin, Hans-Georg Müller, Byeong U. Park

    Abstract: In this paper an additive regression model for a symmetric positive-definite matrix valued response and multiple scalar predictors is proposed. The model exploits the abelian group structure inherited from either the Log-Cholesky metric or the Log-Euclidean framework that turns the space of symmetric positive-definite matrices into a Riemannian manifold and further a bi-invariant Lie group. The ad… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: 21 pages, 1 figure

    MSC Class: 62G08; 62R30

  6. arXiv:2009.07453  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation

    Authors: Insoo Chung, Byeongwook Kim, Yoonjung Choi, Se Jung Kwon, Yongkweon Jeon, Baeseong Park, Sangha Kim, Dongsoo Lee

    Abstract: The deployment of widely used Transformer architecture is challenging because of heavy computation load and memory overhead during inference, especially when the target device is limited in computational resources such as mobile or edge devices. Quantization is an effective technique to address such challenges. Our analysis shows that for a given number of quantization bits, each block of Transfor… ▽ More

    Submitted 13 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Findings of EMNLP 2020

  7. arXiv:2009.04126  [pdf, ps, other

    cs.LG stat.ML

    FleXOR: Trainable Fractional Quantization

    Authors: Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

    Abstract: Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables. Previous attempts, however, only allow for integer numbers of quantization bits, which ends up restricting the search space for compression ratio and accuracy. In this paper, we propose an encryption algorithm/architecture to… ▽ More

    Submitted 22 October, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: Neurips 2020 Accepted

  8. arXiv:2008.10109  [pdf, other

    stat.ME cs.LG stat.AP

    Stable discovery of interpretable subgroups via calibration in causal studies

    Authors: Raaz Dwivedi, Yan Shuo Tan, Briton Park, Mian Wei, Kevin Horgan, David Madigan, Bin Yu

    Abstract: Building on Yu and Kumbier's PCS framework and for randomized experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999-2000 VIGOR study, an 8076 patient randomized controlled trial (RCT), that compared the risk of adverse events from a… ▽ More

    Submitted 28 September, 2020; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: Raaz Dwivedi and Yan Shuo Tan are joint first authors and contributed equally to this work. 52 pages, 8 Figures, 9 Tables. To appear in International Statistical Review, 2020

  9. arXiv:2005.09904  [pdf, ps, other

    cs.LG stat.ML

    BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

    Authors: Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee

    Abstract: The number of parameters in deep neural networks (DNNs) is rapidly increasing to support complicated tasks and to improve model accuracy. Correspondingly, the amount of computations and required memory footprint increase as well. Quantization is an efficient method to address such concerns by compressing DNNs such that computations can be simplified while required storage footprint is significantl… ▽ More

    Submitted 31 August, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: 13 pages, 12 figures

  10. Curating a COVID-19 data repository and forecasting county-level death counts in the United States

    Authors: Nick Altieri, Rebecca L. Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Chao Zhang, Bin Yu

    Abstract: As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative de… ▽ More

    Submitted 9 August, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Authors ordered alphabetically. All authors contributed significantly to this work. All collected data, modeling code, forecasts, and visualizations are updated daily and available at \url{https://github.com/Yu-Group/covid19-severity-prediction}

    Journal ref: Published in Harvard Data Science Review, 2020

  11. arXiv:1905.10138  [pdf, ps, other

    cs.LG stat.ML

    Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

    Authors: Se Jung Kwon, Dongsoo Lee, Byeongwook Kim, Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

    Abstract: Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new weight representation scheme for S… ▽ More

    Submitted 5 March, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

  12. arXiv:1612.03554  [pdf

    q-bio.PE stat.AP

    Modeling the spread of the Zika virus using topological data analysis

    Authors: Derek Lo, Briton Park

    Abstract: Zika virus (ZIKV), a disease spread primarily through the Aedes aegypti mosquito, was identified in Brazil in 2015 and was declared a global health emergency by the World Health Organization (WHO). Epidemiologists often use common state-level attributes such as population density and temperature to determine the spread of disease. By applying techniques from topological data analysis, we believe t… ▽ More

    Submitted 25 January, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

  13. arXiv:0810.5276  [pdf, ps, other

    math.ST stat.ML

    Choice of neighbor order in nearest-neighbor classification

    Authors: Peter Hall, Byeong U. Park, Richard J. Samworth

    Abstract: The $k$th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of $k$; and by the absence of techniques for empirical choice of $k$. In the present paper we detail the wa… ▽ More

    Submitted 29 October, 2008; originally announced October 2008.

    Comments: Published in at http://dx.doi.org/10.1214/07-AOS537 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS537 MSC Class: 62H30 (Primary); 62G20 (Secondary)

    Journal ref: Annals of Statistics 2008, Vol. 36, No. 5, 2135-2152