Skip to main content

Showing 1–50 of 156 results for author: Lee, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.00493  [pdf, other

    stat.ME

    Assessment of Case Influence in the Lasso with a Case-weight Adjusted Solution Path

    Authors: Zhenbang Jiao, Yoonkyung Lee

    Abstract: We study case influence in the Lasso regression using Cook's distance which measures overall change in the fitted values when one observation is deleted. Unlike in ordinary least squares regression, the estimated coefficients in the Lasso do not have a closed form due to the nondifferentiability of the $\ell_1$ penalty, and neither does Cook's distance. To find the case-deleted Lasso solution with… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 34 pages, 10 figures

  2. arXiv:2405.10527  [pdf, other

    stat.ME math.PR stat.AP

    Hawkes Models And Their Applications

    Authors: Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre

    Abstract: The Hawkes process is a model for counting the number of arrivals to a system which exhibits the self-exciting property - that one arrival creates a heightened chance of further arrivals in the near future. The model, and its generalizations, have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  3. arXiv:2403.04613  [pdf, other

    stat.ME

    Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score $ε$-Discretization

    Authors: Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen

    Abstract: We study the problem of simultaneous predictive inference on multiple outcomes missing at random. We consider a suite of possible simultaneous coverage properties, conditionally on the missingness pattern and on the -- possibly discretized/binned -- feature values. For data with discrete feature distributions, we develop a procedure which attains feature- and missingness-conditional coverage; and… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  4. arXiv:2402.02589  [pdf, other

    stat.AP

    Prospective Prediction of Body Mass Index Trajectories using Multi-task Gaussian Processes

    Authors: Arthur Leroy, Varsha Gupta, Mya Thway Tint, Delicia Ooi Shu Qin, Keith M. Godfrey, Fabian Yap, Leck Ngee, Yung Seng Lee, Johan G. Eriksson, Navin Michael, Mauricio A. Alvarez, Dennis Wang

    Abstract: Clinicians often investigate the body mass index (BMI) trajectories of children to assess their growth with respect to their peers, as well as to anticipate future growth and disease risk. While retrospective modelling of BMI trajectories has been an active area of research, prospective prediction of continuous BMI trajectories from historical growth data has not been well investigated. Using weig… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures, 5 tables

  5. arXiv:2401.14355  [pdf, ps, other

    stat.ME

    Multiply Robust Difference-in-Differences Estimation of Causal Effect Curves for Continuous Exposures

    Authors: Gary Hettinger, You** Lee, Nandita Mitra

    Abstract: Researchers commonly use difference-in-differences (DiD) designs to evaluate public policy interventions. While methods exist for estimating effects in the context of binary interventions, policies often result in varied exposures across regions implementing the policy. Yet, existing approaches for incorporating continuous exposures face substantial limitations in addressing confounding variables… ▽ More

    Submitted 15 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  6. arXiv:2401.01849  [pdf, other

    stat.AP

    The expected value of sample information calculations for external validation of risk prediction models

    Authors: Mohsen Sadatsafavi, Andrew J Vickers, Tae Yoon Lee, Paul Gustafson, Laure Wynants

    Abstract: In designing external validation studies of clinical prediction models, contemporary sample size calculation methods are based on the frequentist inferential paradigm. One of the widely reported metrics of model performance is net benefit (NB), and the relevance of conventional inference around NB as a measure of clinical utility is doubtful. Value of Information methodology quantifies the consequ… ▽ More

    Submitted 6 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 14 pages, 4 figures, 0 tables

  7. arXiv:2312.13430  [pdf, other

    stat.ME stat.AP

    Debiasing Sample Loadings and Scores in Exponential Family PCA for Sparse Count Data

    Authors: Ruochen Huang, Yoonkyung Lee

    Abstract: Multivariate count data with many zeros frequently occur in a variety of application areas such as text mining with a document-term matrix and cluster analysis with microbiome abundance data. Exponential family PCA (Collins et al., 2001) is a widely used dimension reduction tool to understand and capture the underlying low-rank structure of count data. It produces principal component scores by fit… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  8. arXiv:2311.16506  [pdf, other

    stat.AP

    Using Bayesian Statistics in Confirmatory Clinical Trials in the Regulatory Setting

    Authors: Se Yoon Lee

    Abstract: Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, suc… ▽ More

    Submitted 30 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  9. arXiv:2311.08677  [pdf, other

    cs.LG cs.DC cs.IT stat.ML

    Federated Learning for Sparse Principal Component Analysis

    Authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee

    Abstract: In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keepin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 1 table. Accepted by IEEE BigData 2023, Sorrento, Italy

  10. arXiv:2310.11654  [pdf, other

    cs.LG stat.ML

    Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features

    Authors: Hangbin Lee, Il Do Ha, Changha Hwang, Youngjo Lee

    Abstract: There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capt… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  11. arXiv:2310.10393  [pdf, other

    stat.ME

    Statistical and Causal Robustness for Causal Null Hypothesis Tests

    Authors: Junhui Yang, Rohit Bhattacharya, You** Lee, Ted Westling

    Abstract: Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to… ▽ More

    Submitted 29 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  12. arXiv:2310.09960  [pdf, other

    stat.ME

    Point Mass in the Confidence Distribution: Is it a Drawback or an Advantage?

    Authors: Hangbin Lee, Youngjo Lee

    Abstract: Stein's (1959) problem highlights the phenomenon called the probability dilution in high dimensional cases, which is known as a fundamental deficiency in probabilistic inference. The satellite conjunction problem also suffers from probability dilution that poor-quality data can lead to a dilution of collision probability. Though various methods have been proposed, such as generalized fiducial dist… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  13. arXiv:2310.09955  [pdf, other

    math.ST stat.ME

    On the Statistical Foundations of H-likelihood for Unobserved Random Variables

    Authors: Hangbin Lee, Youngjo Lee

    Abstract: The maximum likelihood estimation is widely used for statistical inferences. This paper aims to reformulate Lee and Nelder's (1996) h-likelihood, so that the maximum h-likelihood estimator resembles the maximum likelihood estimator of the classical likelihood. We establish the statistical foundations of the new h-likelihood. This extends classical likelihood theories to embrace broader class of st… ▽ More

    Submitted 5 December, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  14. arXiv:2310.09495  [pdf, other

    cs.LG cs.CV stat.ML

    Learning In-between Imagery Dynamics via Physical Latent Spaces

    Authors: Jihun Han, Yoonsang Lee, Anne Gelb

    Abstract: We present a framework designed to learn the underlying dynamics between two images observed at consecutive time steps. The complex nature of image data and the lack of temporal information pose significant challenges in capturing the unique evolving patterns. Our proposed method focuses on estimating the intermediary stages of image evolution, allowing for interpretability through latent dynamics… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 26 pages, 13 figures

    MSC Class: 37M05; 62F99; 68T45

  15. arXiv:2309.16829  [pdf, other

    math.NA cs.LG stat.ML

    An analysis of the derivative-free loss method for solving PDEs

    Authors: Jihun Han, Yoonsang Lee

    Abstract: This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational effici… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 18 pages, 6 figures

    MSC Class: 65N15; 65N75; 65C05; 60G46

  16. arXiv:2308.13047  [pdf, other

    cs.LG cs.AI stat.ME

    Federated Causal Inference from Observational Data

    Authors: Thanh Vinh Vo, Young lee, Tze-Yun Leong

    Abstract: Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Preprint. arXiv admin note: substantial text overlap with arXiv:2301.00346

  17. arXiv:2308.09009  [pdf, other

    econ.EM stat.CO

    Closed-form approximations of moments and densities of continuous-time Markov models

    Authors: Dennis Kristensen, Young Jun Lee, Antonio Mele

    Abstract: This paper develops power series expansions of a general class of moment functions, including transition densities and option prices, of continuous-time Markov processes, including jump--diffusions. The proposed expansions extend the ones in Kristensen and Mele (2011) to cover general Markov processes. We demonstrate that the class of expansions nests the transition density and option price expans… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  18. arXiv:2307.06581  [pdf, other

    stat.ML cs.LG stat.ME

    Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

    Authors: Hangbin Lee, IL DO HA, Youngjo Lee

    Abstract: For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  19. arXiv:2306.15173  [pdf, other

    stat.ME

    Robust propensity score weighting estimation under missing at random

    Authors: Hengfang Wang, Jae Kwang Kim, Jeongseop Han, Youngjo Lee

    Abstract: Missing data is frequently encountered in many areas of statistics. Propensity score weighting is a popular method for handling missing data. The propensity score method employs a response propensity model, but correct specification of the statistical model can be challenging in the presence of missing data. Doubly robust estimation is attractive, as the consistency of the estimator is guaranteed… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

  20. arXiv:2306.06342  [pdf, other

    math.ST stat.ME

    Distribution-free inference with hierarchical data

    Authors: Yonghoon Lee, Rina Foygel Barber, Rebecca Willett

    Abstract: This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal predict… ▽ More

    Submitted 2 March, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

  21. arXiv:2306.01337  [pdf, other

    cs.CL stat.ML

    MathChat: Converse to Tackle Challenging Math Problems with LLM Agents

    Authors: Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang

    Abstract: Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields. LLMs, with their generalized ability, are used as a foundation model to build AI agents for different tasks. In this paper, we study the effectiveness of utilizing LLM age… ▽ More

    Submitted 28 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Update version

  22. arXiv:2305.05532  [pdf, other

    eess.SP cs.AI cs.LG stat.AP stat.ML

    An ensemble of convolution-based methods for fault detection using vibration signals

    Authors: Xian Yeow Lee, Aman Kumar, Lasitha Vidyaratne, Aniruddha Rajendra Rao, Ahmed Farahat, Chetan Gupta

    Abstract: This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 12 Pages, 9 Figures, 2 Tables. Accepted at ICPHM 2023

    Journal ref: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM)

  23. arXiv:2303.06227  [pdf, other

    stat.ME

    Policy effect evaluation under counterfactual neighborhood interventions in the presence of spillover

    Authors: You** Lee, Gary Hettinger, Nandita Mitra

    Abstract: Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by as… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  24. arXiv:2302.06085  [pdf, ps, other

    cs.DS cs.CR cs.LG math.PR stat.CO

    Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler

    Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

    Abstract: The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transfo… ▽ More

    Submitted 22 February, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: Comments welcome! v2 improves constant in duality result, adds citations

  25. arXiv:2301.10419  [pdf, other

    stat.AP

    Deconstructing Pedestrian Crossing Decision-making in Interactions with Continuous Traffic: an Anthropomorphic Model

    Authors: Kai Tian, Gustav Markkula, Chongfeng Wei, Yee Mun Lee, Ruth Madigan, Toshiya Hirose, Natasha Merat, Richard Romano

    Abstract: As safe and comfortable interactions with pedestrians could contribute to automated vehicles' (AVs) social acceptance and scale, increasing attention has been drawn to computational pedestrian behavior models. However, very limited studies characterize pedestrian crossing behavior based on specific behavioral mechanisms, as those mechanisms underpinning pedestrian road behavior are not yet clear.… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  26. arXiv:2301.06697  [pdf, ps, other

    stat.ME

    Estimation of Policy-Relevant Causal Effects in the Presence of Interference with an Application to the Philadelphia Beverage Tax

    Authors: Gary Hettinger, Christina Roberto, You** Lee, Nandita Mitra

    Abstract: To comprehensively evaluate a public policy intervention, researchers must consider the effects of the policy not just on the implementing region, but also nearby, indirectly-affected regions. For example, an excise tax on sweetened beverages in Philadelphia was shown to not only be associated with a decrease in volume sales of taxed beverages in Philadelphia, but also an increase in sales in bord… ▽ More

    Submitted 1 February, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

  27. arXiv:2301.04412  [pdf, ps, other

    stat.ME stat.CO

    RobustIV and controlfunctionIV: Causal Inference for Linear and Nonlinear Models with Invalid Instrumental Variables

    Authors: Taehyeon Koo, You** Lee, Dylan S. Small, Zijian Guo

    Abstract: We present R software packages RobustIV and controlfunctionIV for causal inference with possibly invalid instrumental variables. RobustIV focuses on the linear outcome model. It implements the two-stage hard thresholding method to select valid instrumental variables from a set of candidate instrumental variables and make inferences for the causal effect in both low- and high-dimensional settings.… ▽ More

    Submitted 20 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  28. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia **, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  29. arXiv:2301.00346  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects

    Authors: Thanh Vinh Vo, Arnab Bhattacharyya, Young Lee, Tze-Yun Leong

    Abstract: We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: NeurIPS 2022

  30. arXiv:2212.01539  [pdf, other

    cs.LG stat.ML

    Exploring the Limits of Differentially Private Deep Learning with Group-wise Clip**

    Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

    Abstract: Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clip**}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clip**}, where the… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: 25 pages

  31. arXiv:2210.10967  [pdf, other

    stat.ME stat.CO

    Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

    Authors: Yong-Shiuan Lee

    Abstract: Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or boot… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 34 pages, 9 figures

  32. arXiv:2210.07219  [pdf, ps, other

    cs.DS cs.LG math.NA stat.ML

    Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

    Authors: Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

    Abstract: We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of $e^{-f(x)}$ on a convex body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the form of $e^{-α^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of… ▽ More

    Submitted 10 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Improved writing & Theory for arXiv:2202.01908

  33. arXiv:2209.10105  [pdf, ps, other

    cs.LG cs.DC stat.ML

    Distributed Online Non-convex Optimization with Composite Regret

    Authors: Zhanhong Jiang, Aditya Balu, Xian Yeow Lee, Young M. Lee, Chinmay Hegde, Soumik Sarkar

    Abstract: Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex los… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: 41 pages, presented in allerton conference 2022

  34. Value of Information Analysis for External Validation of Risk Prediction Models

    Authors: Mohsen Sadatsafavi, Tae Yoon Lee, Laure Wynants, Andrew Vickers, Paul Gustafson

    Abstract: Background: Before being used to inform patient care, a risk prediction model needs to be validated in a representative sample from the target population. The finite size of the validation sample entails that there is uncertainty with respect to estimates of model performance. We apply value-of-information methodology as a framework to quantify the consequence of such uncertainty in terms of NB. M… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: 24 pages, 4,484 words, 1 table, 2 boxes, 5 figures

  35. arXiv:2207.09891  [pdf, other

    stat.ME

    Maximum Likelihood Imputation

    Authors: Jeongseop Han, Youngjo Lee, Jae Kwang Kim

    Abstract: Maximum likelihood (ML) estimation is widely used in statistics. The h-likelihood has been proposed as an extension of Fisher's likelihood to statistical models including unobserved latent variables of recent interest. Its advantage is that the joint maximization gives ML estimators (MLEs) of both fixed and random parameters with their standard error estimates. However, the current h-likelihood ap… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  36. arXiv:2207.09871  [pdf, ps, other

    stat.ME

    Enhanced Laplace Approximation

    Authors: Jeongseop Han, Youngjo Lee

    Abstract: The Laplace approximation (LA) has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators (MLEs) based on the LA are often biased for binary or spatial data, and the corresponding Hessian matrix underestimates the standard errors of these approximate MLEs. A higher-order approximation h… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  37. arXiv:2207.08347  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Convex Optimization in General Norms

    Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

    Abstract: We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work o… ▽ More

    Submitted 10 November, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: SODA 2023

  38. arXiv:2207.00160  [pdf, other

    cs.LG cs.CR stat.ML

    When Does Differentially Private Learning Not Suffer in High Dimensions?

    Authors: Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

    Abstract: Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following researc… ▽ More

    Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 26 pages; v3 includes additional experiments and clarification

  39. arXiv:2206.12663  [pdf, other

    stat.ML cs.LG stat.CO

    Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

    Authors: Yoonhyung Lee, Sungdong Lee, Joong-Ho Won

    Abstract: The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model paramet… ▽ More

    Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: Accepted to the 39 th International Conference on Machine Learning. This version contains corrections to typos found after submitting the camera-ready version

  40. arXiv:2206.02032  [pdf, other

    cs.LG math.NA stat.ML

    A Neural Network Approach for Homogenization of Multiscale Problems

    Authors: Jihun Han, Yoonsang Lee

    Abstract: We propose a neural network-based approach to the homogenization of multiscale problems. The proposed method uses a derivative-free formulation of a training loss, which incorporates Brownian walkers to find the macroscopic description of a multiscale PDE solution. Compared with other network-based approaches for multiscale problems, the proposed method is free from the design of hand-crafted neur… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: 20 pages, 6 figures

    MSC Class: 65N99; 65C05; 68T07

  41. Closed-Form Solution of the Unit Normal Loss Integral in Two-Dimensions

    Authors: Tae Yoon Lee, Paul Gustafson, Mohsen Sadatsafavi

    Abstract: In Value of Information (VoI) analysis, the unit normal loss integral (UNLI) frequently emerges as a solution for the computation of various VoI metrics. However, one limitation of the UNLI has been that its closed-form solution is available for only one dimension, and thus can be used for comparisons involving only two strategies (where it is applied to the scalar incremental net benefit). We der… ▽ More

    Submitted 23 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: 1 table, 1 figure, will be submitted to MDM - technical note

  42. arXiv:2202.03418  [pdf, other

    cs.LG stat.ML

    Diversify and Disambiguate: Learning From Underspecified Data

    Authors: Yoonho Lee, Huaxiu Yao, Chelsea Finn

    Abstract: Many datasets are underspecified: there exist multiple equally viable solutions to a given task. Underspecification can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus produce widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework… ▽ More

    Submitted 21 February, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: ICLR 2023. Code is available at https://github.com/yoonholee/DivDis

  43. arXiv:2201.12430  [pdf, other

    stat.ME stat.CO

    Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications

    Authors: Se Yoon Lee

    Abstract: Nonlinear mixed effects models have become a standard platform for analysis when data is in the form of continuous and repeated measurements of subjects from a population of interest, while temporal profiles of subjects commonly follow a nonlinear tendency. While frequentist analysis of nonlinear mixed effects models has a long history, Bayesian analysis of the models has received comparatively li… ▽ More

    Submitted 2 March, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  44. arXiv:2110.15012  [pdf, ps, other

    stat.OT math.ST

    On rereading Savage

    Authors: Yudi Pawitan, Youngjo Lee

    Abstract: If we accept Savage's set of axioms, then all uncertainties must be treated like ordinary probability. Savage espoused subjective probability, allowing, for example, the probability of Donald Trump's re-election. But Savage's probability also covers the objective version, such as the probability of heads in a fair toss of a coin. In other words, there is no distinction between objective and subjec… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 23 pages

  45. arXiv:2110.13812  [pdf, other

    stat.AP physics.ao-ph

    Day-ahead Forecasts of Air Temperature

    Authors: Hewei Wang, Muhammad Salman Pathan, Yee Hui Lee, Soumyabrata Dev

    Abstract: Air temperature is an essential factor that directly impacts the weather. Temperature can be counted as an important sign of climatic change, that profoundly impacts our health, development, and urban planning. Therefore, it is vital to design a framework that can accurately predict the temperature values for considerable lead times. In this paper, we propose a technique based on exponential smoot… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021

  46. arXiv:2110.07531  [pdf

    stat.ML cs.LG physics.bio-ph q-bio.BM

    Deep learning models for predicting RNA degradation via dual crowdsourcing

    Authors: Hannah K. Wayment-Steele, Wipapat Kladwang, Andrew M. Watkins, Do Soon Kim, Bojan Tunguz, Walter Reade, Maggie Demkin, Jonathan Romano, Roger Wellington-Oguri, John J. Nicol, Jiayang Gao, Kazuki Onodera, Kazuki Fujikawa, Hanfei Mao, Gilles Vandewiele, Michele Tinti, Bram Steenwinckel, Takuya Ito, Taiga Noumi, Shujun He, Keiichiro Ishi, Youhan Lee, Fatih Öztürk, Anthony Chiu, Emin Öztürk , et al. (4 additional authors not shown)

    Abstract: Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke… ▽ More

    Submitted 22 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  47. arXiv:2110.06500  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Differentially Private Fine-tuning of Language Models

    Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

    Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

  48. arXiv:2106.14415  [pdf, ps, other

    stat.CO math.PR

    Exact simulation of extrinsic stress-release processes

    Authors: Young Lee, Patrick J. Laub, Thomas Taimre, Hongbiao Zhao, Jiancang Zhuang

    Abstract: We present a new and straightforward algorithm that simulates exact sample paths for a generalized stress-release process. The computation of the exact law of the joint interarrival times is detailed and used to derive this algorithm. Furthermore, the martingale generator of the process is derived and induces theoretical moments which generalize some results of Borovkov & Vere-Jones (2000) and are… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    MSC Class: 60G20 (Primary) 60G55; 65C05 (Secondary)

  49. arXiv:2106.14258  [pdf, other

    stat.AP stat.CO

    Sparse Logistic Tensor Decomposition for Binary Data

    Authors: Jianhao Zhang, Yoonkyung Lee

    Abstract: Tensor data are increasingly available in many application domains. We develop several tensor decomposition methods for binary tensor data. Different from classical tensor decompositions for continuous-valued data with squared error loss, we formulate logistic tensor decompositions for binary data with a Bernoulli likelihood. To enhance the interpretability of estimated factors and improve their s… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  50. Uncertainty and Value of Information in Risk Prediction Modeling

    Authors: Mohsen Sadatsafavi, Tae Yoon Lee, Paul Gustafson

    Abstract: Background: Due to the finite size of the development sample, predicted probabilities from a risk prediction model are inevitably uncertain. We apply Value of Information methodology to evaluate the decision-theoretic implications of prediction uncertainty. Methods: Adopting a Bayesian perspective, we extend the definition of the Expected Value of Perfect Information (EVPI) from decision analysi… ▽ More

    Submitted 3 November, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: 24 pages, 1 table, 3 figures