Search | arXiv e-print repository

Assessment of Case Influence in the Lasso with a Case-weight Adjusted Solution Path

Abstract: We study case influence in the Lasso regression using Cook's distance which measures overall change in the fitted values when one observation is deleted. Unlike in ordinary least squares regression, the estimated coefficients in the Lasso do not have a closed form due to the nondifferentiability of the $\ell_1$ penalty, and neither does Cook's distance. To find the case-deleted Lasso solution with… ▽ More We study case influence in the Lasso regression using Cook's distance which measures overall change in the fitted values when one observation is deleted. Unlike in ordinary least squares regression, the estimated coefficients in the Lasso do not have a closed form due to the nondifferentiability of the $\ell_1$ penalty, and neither does Cook's distance. To find the case-deleted Lasso solution without refitting the model, we approach it from the full data solution by introducing a weight parameter ranging from 1 to 0 and generating a solution path indexed by this parameter. We show that the solution path is piecewise linear with respect to a simple function of the weight parameter under a fixed penalty. The resulting case influence is a function of the penalty and weight, and it becomes Cook's distance when the weight is 0. As the penalty parameter changes, selected variables change, and the magnitude of Cook's distance for the same data point may vary with the subset of variables selected. In addition, we introduce a case influence graph to visualize how the contribution of each data point changes with the penalty parameter. From the graph, we can identify influential points at different penalty levels and make modeling decisions accordingly. Moreover, we find that case influence graphs exhibit different patterns between underfitting and overfitting phases, which can provide additional information for model selection. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 34 pages, 10 figures

arXiv:2405.10527 [pdf, other]

Hawkes Models And Their Applications

Authors: Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre

Abstract: The Hawkes process is a model for counting the number of arrivals to a system which exhibits the self-exciting property - that one arrival creates a heightened chance of further arrivals in the near future. The model, and its generalizations, have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model… ▽ More The Hawkes process is a model for counting the number of arrivals to a system which exhibits the self-exciting property - that one arrival creates a heightened chance of further arrivals in the near future. The model, and its generalizations, have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed which: track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This paper creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction, simulation algorithms, and giving key references to the appropriate literature for a detailed treatment. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2403.04613 [pdf, other]

Simultaneous Conformal Prediction of Missing Outcomes with Propensity Score $ε$-Discretization

Authors: Yonghoon Lee, Edgar Dobriban, Eric Tchetgen Tchetgen

Abstract: We study the problem of simultaneous predictive inference on multiple outcomes missing at random. We consider a suite of possible simultaneous coverage properties, conditionally on the missingness pattern and on the -- possibly discretized/binned -- feature values. For data with discrete feature distributions, we develop a procedure which attains feature- and missingness-conditional coverage; and… ▽ More We study the problem of simultaneous predictive inference on multiple outcomes missing at random. We consider a suite of possible simultaneous coverage properties, conditionally on the missingness pattern and on the -- possibly discretized/binned -- feature values. For data with discrete feature distributions, we develop a procedure which attains feature- and missingness-conditional coverage; and further improve it via pooling its results after partitioning the unobserved outcomes. To handle general continuous feature distributions, we introduce methods based on discretized feature values. To mitigate the issue that feature-discretized data may fail to remain missing at random, we propose propensity score $ε$-discretization. This approach is inspired by the balancing property of the propensity score, namely that the missing data mechanism is independent of the outcome conditional on the propensity [Rosenbaum and Rubin (1983)]. We show that the resulting pro-CP method achieves propensity score discretized feature- and missingness-conditional coverage, when the propensity score is known exactly or is estimated sufficiently accurately. Furthermore, we consider a stronger inferential target, the squared-coverage guarantee, which penalizes the spread of the coverage proportion. We propose methods -- termed pro-CP2 -- to achieve it with similar conditional properties as we have shown for usual coverage. A key novel technical contribution in our results is that propensity score discretization leads to a notion of approximate balancing, which we formalize and characterize precisely. In extensive empirical experiments on simulated data and on a job search intervention dataset, we illustrate that our procedures provide informative prediction sets with valid conditional coverage. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.02589 [pdf, other]

Prospective Prediction of Body Mass Index Trajectories using Multi-task Gaussian Processes

Authors: Arthur Leroy, Varsha Gupta, Mya Thway Tint, Delicia Ooi Shu Qin, Keith M. Godfrey, Fabian Yap, Leck Ngee, Yung Seng Lee, Johan G. Eriksson, Navin Michael, Mauricio A. Alvarez, Dennis Wang

Abstract: Clinicians often investigate the body mass index (BMI) trajectories of children to assess their growth with respect to their peers, as well as to anticipate future growth and disease risk. While retrospective modelling of BMI trajectories has been an active area of research, prospective prediction of continuous BMI trajectories from historical growth data has not been well investigated. Using weig… ▽ More Clinicians often investigate the body mass index (BMI) trajectories of children to assess their growth with respect to their peers, as well as to anticipate future growth and disease risk. While retrospective modelling of BMI trajectories has been an active area of research, prospective prediction of continuous BMI trajectories from historical growth data has not been well investigated. Using weight and height measurements from birth to age 10 years from a longitudinal mother-offspring cohort, we leveraged a multi-task Gaussian processes model, called MagmaClust, to derive probabilistic predictions for BMI trajectories over various forecasting periods. Experiments were conducted to evaluate the accuracy, sensitivity to missing values, and number of clusters. The results were compared with cubic B-spline regression and a parametric Jenss-Bayley mixed effects model. A downstream tool computing individual overweight probabilities was also proposed and evaluated. In all experiments, MagmaClust outperformed conventional models in prediction accuracy while correctly calibrating uncertainty regardless of the missing data amount (up to 90\% missing) or the forecasting period (from 2 to 8 years in the future). Moreover, the overweight probabilities computed from MagmaClust's uncertainty quantification exhibited high specificity ($0.94$ to $0.96$) and accuracy ($0.86$ to $0.94$) in predicting the 10-year overweight status even from age 2 years. MagmaClust provides a probabilistic non-parametric framework to prospectively predict BMI trajectories, which is robust to missing values and outperforms conventional BMI trajectory modelling approaches. It also clusters individuals to identify typical BMI patterns (early peak, adiposity rebounds) during childhood. Overall, we demonstrated its potential to anticipate BMI evolution throughout childhood, allowing clinicians to implement prevention strategies. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 17 pages, 9 figures, 5 tables

arXiv:2401.14355 [pdf, ps, other]

Multiply Robust Difference-in-Differences Estimation of Causal Effect Curves for Continuous Exposures

Authors: Gary Hettinger, You** Lee, Nandita Mitra

Abstract: Researchers commonly use difference-in-differences (DiD) designs to evaluate public policy interventions. While methods exist for estimating effects in the context of binary interventions, policies often result in varied exposures across regions implementing the policy. Yet, existing approaches for incorporating continuous exposures face substantial limitations in addressing confounding variables… ▽ More Researchers commonly use difference-in-differences (DiD) designs to evaluate public policy interventions. While methods exist for estimating effects in the context of binary interventions, policies often result in varied exposures across regions implementing the policy. Yet, existing approaches for incorporating continuous exposures face substantial limitations in addressing confounding variables associated with intervention status, exposure levels, and outcome trends. These limitations significantly constrain policymakers' ability to fully comprehend policy impacts and design future interventions. In this work, we propose new estimators for causal effect curves within the DiD framework, accounting for multiple sources of confounding. Our approach accommodates misspecification of a subset of treatment, exposure, and outcome models while avoiding any parametric assumptions on the effect curve. We present the statistical properties of the proposed methods and illustrate their application through simulations and a study investigating the heterogeneous effects of a nutritional excise tax under different levels of accessibility to cross-border shop**. △ Less

Submitted 15 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.01849 [pdf, other]

The expected value of sample information calculations for external validation of risk prediction models

Authors: Mohsen Sadatsafavi, Andrew J Vickers, Tae Yoon Lee, Paul Gustafson, Laure Wynants

Abstract: In designing external validation studies of clinical prediction models, contemporary sample size calculation methods are based on the frequentist inferential paradigm. One of the widely reported metrics of model performance is net benefit (NB), and the relevance of conventional inference around NB as a measure of clinical utility is doubtful. Value of Information methodology quantifies the consequ… ▽ More In designing external validation studies of clinical prediction models, contemporary sample size calculation methods are based on the frequentist inferential paradigm. One of the widely reported metrics of model performance is net benefit (NB), and the relevance of conventional inference around NB as a measure of clinical utility is doubtful. Value of Information methodology quantifies the consequences of uncertainty in terms of its impact on clinical utility of decisions. We introduce the expected value of sample information (EVSI) for validation as the expected gain in NB from conducting an external validation study of a given size. We propose algorithms for EVSI computation, and in a case study demonstrate how EVSI changes as a function of the amount of current information and future study's sample size. Value of Information methodology provides a decision-theoretic lens to the process of planning a validation study of a risk prediction model and can complement conventional methods when designing such studies. △ Less

Submitted 6 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures, 0 tables

arXiv:2312.13430 [pdf, other]

Debiasing Sample Loadings and Scores in Exponential Family PCA for Sparse Count Data

Authors: Ruochen Huang, Yoonkyung Lee

Abstract: Multivariate count data with many zeros frequently occur in a variety of application areas such as text mining with a document-term matrix and cluster analysis with microbiome abundance data. Exponential family PCA (Collins et al., 2001) is a widely used dimension reduction tool to understand and capture the underlying low-rank structure of count data. It produces principal component scores by fit… ▽ More Multivariate count data with many zeros frequently occur in a variety of application areas such as text mining with a document-term matrix and cluster analysis with microbiome abundance data. Exponential family PCA (Collins et al., 2001) is a widely used dimension reduction tool to understand and capture the underlying low-rank structure of count data. It produces principal component scores by fitting Poisson regression models with estimated loadings as covariates. This tends to result in extreme scores for sparse count data significantly deviating from true scores. We consider two major sources of bias in this estimation procedure and propose ways to reduce their effects. First, the discrepancy between true loadings and their estimates under a limited sample size largely degrades the quality of score estimates. By treating estimated loadings as covariates with bias and measurement errors, we debias score estimates, using the iterative bootstrap method for loadings and considering classical measurement error models. Second, the existence of MLE bias is often ignored in score estimation, but this bias could be removed through well-known MLE bias reduction methods. We demonstrate the effectiveness of the proposed bias correction procedure through experiments on both simulated data and real data. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.16506 [pdf, other]

Using Bayesian Statistics in Confirmatory Clinical Trials in the Regulatory Setting

Authors: Se Yoon Lee

Abstract: Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, suc… ▽ More Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, such as historical data or another source of co-data. In recent years, there has been a significant increase in regulatory submissions using Bayesian statistics due to its flexibility and ability to provide valuable insights for decision-making, addressing the modern complexity of clinical trials where frequentist trials are inadequate. For regulatory submissions, companies often need to consider the frequentist operating characteristics of the Bayesian analysis strategy, regardless of the design complexity. In particular, the focus is on the frequentist type I error rate and power for all realistic alternatives. This tutorial review aims to provide a comprehensive overview of the use of Bayesian statistics in sample size determination in the regulatory environment of clinical trials. Fundamental concepts of Bayesian sample size determination and illustrative examples are provided to serve as a valuable resource for researchers, clinicians, and statisticians seeking to develop more complex and innovative designs. △ Less

Submitted 30 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.08677 [pdf, other]

Federated Learning for Sparse Principal Component Analysis

Authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee

Abstract: In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keepin… ▽ More In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by kee** data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the L1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 11 pages, 7 figures, 1 table. Accepted by IEEE BigData 2023, Sorrento, Italy

arXiv:2310.11654 [pdf, other]

Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features

Authors: Hangbin Lee, Il Do Ha, Changha Hwang, Youngjo Lee

Abstract: There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capt… ▽ More There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capturing both nonlinear effects of input variables and subject-specific cluster effects. The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects by optimizing a single objective function. This approach enables a fast end-to-end algorithm for handling clustered count data, which often involve high-cardinality categorical features. Furthermore, state-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework. As an example, we introduce multi-head attention layer and a sparsemax function, which allows feature selection in high-dimensional settings. To enhance practical performance and learning efficiency, we present an adjustment procedure for prediction of random parameters and a method-of-moments estimator for pretraining of variance component. Various experiential studies and real data analyses confirm the advantages of our proposed methods. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10393 [pdf, other]

Statistical and Causal Robustness for Causal Null Hypothesis Tests

Authors: Junhui Yang, Rohit Bhattacharya, You** Lee, Ted Westling

Abstract: Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to… ▽ More Prior work applying semiparametric theory to causal inference has primarily focused on deriving estimators that exhibit statistical robustness under a prespecified causal model that permits identification of a desired causal parameter. However, a fundamental challenge is correct specification of such a model, which usually involves making untestable assumptions. Evidence factors is an approach to combining hypothesis tests of a common causal null hypothesis under two or more candidate causal models. Under certain conditions, this yields a test that is valid if at least one of the underlying models is correct, which is a form of causal robustness. We propose a method of combining semiparametric theory with evidence factors. We develop a causal null hypothesis test based on joint asymptotic normality of K asymptotically linear semiparametric estimators, where each estimator is based on a distinct identifying functional derived from each of K candidate causal models. We show that this test provides both statistical and causal robustness in the sense that it is valid if at least one of the K proposed causal models is correct, while also allowing for slower than parametric rates of convergence in estimating nuisance functions. We demonstrate the effectiveness of our method via simulations and applications to the Framingham Heart Study and Wisconsin Longitudinal Study. △ Less

Submitted 29 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.09960 [pdf, other]

Point Mass in the Confidence Distribution: Is it a Drawback or an Advantage?

Authors: Hangbin Lee, Youngjo Lee

Abstract: Stein's (1959) problem highlights the phenomenon called the probability dilution in high dimensional cases, which is known as a fundamental deficiency in probabilistic inference. The satellite conjunction problem also suffers from probability dilution that poor-quality data can lead to a dilution of collision probability. Though various methods have been proposed, such as generalized fiducial dist… ▽ More Stein's (1959) problem highlights the phenomenon called the probability dilution in high dimensional cases, which is known as a fundamental deficiency in probabilistic inference. The satellite conjunction problem also suffers from probability dilution that poor-quality data can lead to a dilution of collision probability. Though various methods have been proposed, such as generalized fiducial distribution and the reference posterior, they could not maintain the coverage probability of confidence intervals (CIs) in both problems. On the other hand, the confidence distribution (CD) has a point mass at zero, which has been interpreted paradoxical. However, we show that this point mass is an advantage rather than a drawback, because it gives a way to maintain the coverage probability of CIs. More recently, `false confidence theorem' was presented as another deficiency in probabilistic inferences, called the false confidence. It was further claimed that the use of consonant belief can mitigate this deficiency. However, we show that the false confidence theorem cannot be applied to the CD in both Stein's and satellite conjunction problems. It is crucial that a confidence feature, not a consonant one, is the key to overcome the deficiencies in probabilistic inferences. Our findings reveal that the CD outperforms the other existing methods, including the consonant belief, in the context of Stein's and satellite conjunction problems. Additionally, we demonstrate the ambiguity of coverage probability in an observed CI from the frequentist CI procedure, and show that the CD provides valuable information regarding this ambiguity. △ Less

Submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.09955 [pdf, other]

On the Statistical Foundations of H-likelihood for Unobserved Random Variables

Authors: Hangbin Lee, Youngjo Lee

Abstract: The maximum likelihood estimation is widely used for statistical inferences. This paper aims to reformulate Lee and Nelder's (1996) h-likelihood, so that the maximum h-likelihood estimator resembles the maximum likelihood estimator of the classical likelihood. We establish the statistical foundations of the new h-likelihood. This extends classical likelihood theories to embrace broader class of st… ▽ More The maximum likelihood estimation is widely used for statistical inferences. This paper aims to reformulate Lee and Nelder's (1996) h-likelihood, so that the maximum h-likelihood estimator resembles the maximum likelihood estimator of the classical likelihood. We establish the statistical foundations of the new h-likelihood. This extends classical likelihood theories to embrace broader class of statistical models with random parameters. Maximization of the h-likelihood yields asymptotically optimal estimators for both fixed and random parameters achieving the generalized Cramér-Rao lower bound, while providing computationally efficient fitting algorithms. Furthermore, we explore asymptotic theory when the consistency of either fixed parameter estimation or random parameter prediction is violated. We also study how to obtain maximum h-likelihood estimators when the h-likelihood is not explicitly available. △ Less

Submitted 5 December, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.09495 [pdf, other]

Learning In-between Imagery Dynamics via Physical Latent Spaces

Authors: Jihun Han, Yoonsang Lee, Anne Gelb

Abstract: We present a framework designed to learn the underlying dynamics between two images observed at consecutive time steps. The complex nature of image data and the lack of temporal information pose significant challenges in capturing the unique evolving patterns. Our proposed method focuses on estimating the intermediary stages of image evolution, allowing for interpretability through latent dynamics… ▽ More We present a framework designed to learn the underlying dynamics between two images observed at consecutive time steps. The complex nature of image data and the lack of temporal information pose significant challenges in capturing the unique evolving patterns. Our proposed method focuses on estimating the intermediary stages of image evolution, allowing for interpretability through latent dynamics while preserving spatial correlations with the image. By incorporating a latent variable that follows a physical model expressed in partial differential equations (PDEs), our approach ensures the interpretability of the learned model and provides insight into corresponding image dynamics. We demonstrate the robustness and effectiveness of our learning framework through a series of numerical tests using geoscientific imagery data. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 26 pages, 13 figures

MSC Class: 37M05; 62F99; 68T45

arXiv:2309.16829 [pdf, other]

An analysis of the derivative-free loss method for solving PDEs

Authors: Jihun Han, Yoonsang Lee

Abstract: This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational effici… ▽ More This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational efficiency, trainability, and sampling errors. Our analysis shows that the training loss bias is proportional to the time interval and the spatial gradient of the neural network while inversely proportional to the walker size. We also show that the time interval must be sufficiently long to train the network. These analytic results tell that we can choose the walker size as small as possible based on the optimal lower bound of the time interval. We also provide numerical tests supporting our analysis. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 18 pages, 6 figures

MSC Class: 65N15; 65N75; 65C05; 60G46

arXiv:2308.13047 [pdf, other]

Federated Causal Inference from Observational Data

Authors: Thanh Vinh Vo, Young lee, Tze-Yun Leong

Abstract: Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal… ▽ More Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal effects from decentralized data sources. The proposed framework avoid exchanging raw data among the sources, thus contributing towards privacy-preserving causal learning. Three instances of the proposed framework are introduced to estimate causal effects across a wide range of diverse scenarios within a federated setting. (1) FedCI: a Bayesian framework based on Gaussian processes for estimating causal effects from federated observational data sources. It estimates the posterior distributions of the causal effects to compute the higher-order statistics that capture the uncertainty. (2) CausalRFF: an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. It estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. (3) CausalFI: a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. It accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. The proposed federated framework and its instances are an important step towards a privacy-preserving causal learning model. △ Less

Submitted 30 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Preprint. arXiv admin note: substantial text overlap with arXiv:2301.00346

arXiv:2308.09009 [pdf, other]

Closed-form approximations of moments and densities of continuous-time Markov models

Authors: Dennis Kristensen, Young Jun Lee, Antonio Mele

Abstract: This paper develops power series expansions of a general class of moment functions, including transition densities and option prices, of continuous-time Markov processes, including jump--diffusions. The proposed expansions extend the ones in Kristensen and Mele (2011) to cover general Markov processes. We demonstrate that the class of expansions nests the transition density and option price expans… ▽ More This paper develops power series expansions of a general class of moment functions, including transition densities and option prices, of continuous-time Markov processes, including jump--diffusions. The proposed expansions extend the ones in Kristensen and Mele (2011) to cover general Markov processes. We demonstrate that the class of expansions nests the transition density and option price expansions developed in Yang, Chen, and Wan (2019) and Wan and Yang (2021) as special cases, thereby connecting seemingly different ideas in a unified framework. We show how the general expansion can be implemented for fully general jump--diffusion models. We provide a new theory for the validity of the expansions which shows that series expansions are not guaranteed to converge as more terms are added in general. Thus, these methods should be used with caution. At the same time, the numerical studies in this paper demonstrate good performance of the proposed implementation in practice when a small number of terms are included. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2307.06581 [pdf, other]

Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

Authors: Hangbin Lee, IL DO HA, Youngjo Lee

Abstract: For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood… ▽ More For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox). △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2306.15173 [pdf, other]

Robust propensity score weighting estimation under missing at random

Authors: Hengfang Wang, Jae Kwang Kim, Jeongseop Han, Youngjo Lee

Abstract: Missing data is frequently encountered in many areas of statistics. Propensity score weighting is a popular method for handling missing data. The propensity score method employs a response propensity model, but correct specification of the statistical model can be challenging in the presence of missing data. Doubly robust estimation is attractive, as the consistency of the estimator is guaranteed… ▽ More Missing data is frequently encountered in many areas of statistics. Propensity score weighting is a popular method for handling missing data. The propensity score method employs a response propensity model, but correct specification of the statistical model can be challenging in the presence of missing data. Doubly robust estimation is attractive, as the consistency of the estimator is guaranteed when either the outcome regression model or the propensity score model is correctly specified. In this paper, we first employ information projection to develop an efficient and doubly robust estimator under indirect model calibration constraints. The resulting propensity score estimator can be equivalently expressed as a doubly robust regression imputation estimator by imposing the internal bias calibration condition in estimating the regression parameters. In addition, we generalize the information projection to allow for outlier-robust estimation. Some asymptotic properties are presented. The simulation study confirms that the proposed method allows robust inference against not only the violation of various model assumptions, but also outliers. A real-life application is presented using data from the Conservation Effects Assessment Project. △ Less

Submitted 27 March, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.06342 [pdf, other]

Distribution-free inference with hierarchical data

Authors: Yonghoon Lee, Rina Foygel Barber, Rebecca Willett

Abstract: This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal predict… ▽ More This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal prediction and jackknife+. While the standard theoretical guarantee obtained by the conformal prediction framework is a marginal predictive coverage guarantee, in the special case of independent repeated measurements, it is possible to achieve a stronger form of coverage -- the "second-moment coverage" property -- to provide better control of conditional miscoverage rates, and distribution-free prediction sets that achieve this property are constructed. Simulations illustrate that this guarantee indeed leads to uniformly small conditional miscoverage rates. Empirically, this stronger guarantee comes at the cost of a larger width of the prediction set in scenarios where the fitted model is poorly calibrated, but this cost is very mild in cases where the fitted model is accurate. △ Less

Submitted 2 March, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.01337 [pdf, other]

MathChat: Converse to Tackle Challenging Math Problems with LLM Agents

Authors: Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang

Abstract: Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields. LLMs, with their generalized ability, are used as a foundation model to build AI agents for different tasks. In this paper, we study the effectiveness of utilizing LLM age… ▽ More Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields. LLMs, with their generalized ability, are used as a foundation model to build AI agents for different tasks. In this paper, we study the effectiveness of utilizing LLM agents to solve math problems through conversations. We propose MathChat, a conversational problem-solving framework designed for math problems. MathChat consists of an LLM agent and a user proxy agent which is responsible for tool execution and additional guidance. This synergy facilitates a collaborative problem-solving process, where the agents engage in a dialogue to solve the problems. We perform evaluation on difficult high school competition problems from the MATH dataset. Utilizing Python, we show that MathChat can further improve previous tool-using prompting methods by 6%. △ Less

Submitted 28 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Update version

arXiv:2305.05532 [pdf, other]

doi 10.1109/ICPHM57936.2023.10194112

An ensemble of convolution-based methods for fault detection using vibration signals

Authors: Xian Yeow Lee, Aman Kumar, Lasitha Vidyaratne, Aniruddha Rajendra Rao, Ahmed Farahat, Chetan Gupta

Abstract: This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st… ▽ More This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent studies have shown using convolution kernel-based methods like ROCKET, and 1D convolutional neural networks with ResNet and FCN, have robust performance for multivariate time-series data classification. We propose an ensemble of three convolution kernel-based methods and show its efficacy on this fault detection problem by outperforming other approaches and achieving an accuracy of more than 98.8\%. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 12 Pages, 9 Figures, 2 Tables. Accepted at ICPHM 2023

Journal ref: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM)

arXiv:2303.06227 [pdf, other]

Policy effect evaluation under counterfactual neighborhood interventions in the presence of spillover

Authors: You** Lee, Gary Hettinger, Nandita Mitra

Abstract: Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by as… ▽ More Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by asking what policy consequences would the treated units have experienced under hypothetical exposure settings. When we only observe treated unit(s) surrounded by controls -- as is common when a policy intervention is implemented in a single city or state -- this effect inquires about the policy effects under a counterfactual neighborhood policy status that we do not, in actuality, observe. In this work, we extend difference-in-differences (DiD) approaches to spillover settings and develop identification conditions required to evaluate policy effects in counterfactual treatment scenarios. These causal quantities are policy-relevant for designing effective policies for populations subject to various neighborhood statuses. We develop doubly robust estimators and use extensive numerical experiments to examine their performance under heterogeneous spillover effects. We apply our proposed method to investigate the effect of the Philadelphia beverage tax on unit sales. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.06085 [pdf, ps, other]

Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler

Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

Abstract: The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transfo… ▽ More The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do not readily carry over to more general settings. We develop a non-Euclidean analog of the recent proximal sampler of [LST21], which naturally induces regularization by an object known as the log-Laplace transform (LLT) of a density. We prove new mathematical properties (with an algorithmic flavor) of the LLT, such as strong convexity-smoothness duality and an isoperimetric inequality, which are used to prove a mixing time on our proximal sampler matching [LST21] under a warm start. As our main application, we show our warm-started sampler improves the value oracle complexity of differentially private convex optimization in $\ell_p$ and Schatten-$p$ norms for $p \in [1, 2]$ to match the Euclidean setting [GLL22], while retaining state-of-the-art excess risk bounds [GLLST23]. We find our investigation of the LLT to be a promising proof-of-concept of its utility as a tool for designing samplers, and outline directions for future exploration. △ Less

Submitted 22 February, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: Comments welcome! v2 improves constant in duality result, adds citations

arXiv:2301.10419 [pdf, other]

Deconstructing Pedestrian Crossing Decision-making in Interactions with Continuous Traffic: an Anthropomorphic Model

Authors: Kai Tian, Gustav Markkula, Chongfeng Wei, Yee Mun Lee, Ruth Madigan, Toshiya Hirose, Natasha Merat, Richard Romano

Abstract: As safe and comfortable interactions with pedestrians could contribute to automated vehicles' (AVs) social acceptance and scale, increasing attention has been drawn to computational pedestrian behavior models. However, very limited studies characterize pedestrian crossing behavior based on specific behavioral mechanisms, as those mechanisms underpinning pedestrian road behavior are not yet clear.… ▽ More As safe and comfortable interactions with pedestrians could contribute to automated vehicles' (AVs) social acceptance and scale, increasing attention has been drawn to computational pedestrian behavior models. However, very limited studies characterize pedestrian crossing behavior based on specific behavioral mechanisms, as those mechanisms underpinning pedestrian road behavior are not yet clear. Here, we reinterpret pedestrian crossing behavior based on a deconstructed crossing decision process at uncontrolled intersections with continuous traffic. Notably, we explain and model pedestrian crossing behavior as they wait for crossing opportunities, optimizing crossing decisions by comparing the visual collision risk of approaching vehicles around them. A collision risk-based crossing initiation model is proposed to characterize the time-dynamic nature of pedestrian crossing decisions. A simulation tool is established to reproduce pedestrian behavior by employing the proposed model and a social force model. Two datasets collected in a CAVE-based immersive pedestrian simulator are applied to calibrate and validate the model. The model predicts pedestrian crossing decisions across all traffic scenarios well. In particular, by considering the decision strategy that pedestrians compare the collision risk of surrounding traffic gaps, model performance is significantly improved. Moreover, the collision risk-based crossing initiation model accurately captures the timing of pedestrian crossing initiations within each gap. This work concisely demonstrates how pedestrians dynamically adapt their crossings in continuous traffic based on perceived collision risk, potentially providing insights into modeling coupled human-AV interactions or serving as a tool to realize human-like pedestrian road behavior in virtual AVs test platforms. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.06697 [pdf, ps, other]

Estimation of Policy-Relevant Causal Effects in the Presence of Interference with an Application to the Philadelphia Beverage Tax

Authors: Gary Hettinger, Christina Roberto, You** Lee, Nandita Mitra

Abstract: To comprehensively evaluate a public policy intervention, researchers must consider the effects of the policy not just on the implementing region, but also nearby, indirectly-affected regions. For example, an excise tax on sweetened beverages in Philadelphia was shown to not only be associated with a decrease in volume sales of taxed beverages in Philadelphia, but also an increase in sales in bord… ▽ More To comprehensively evaluate a public policy intervention, researchers must consider the effects of the policy not just on the implementing region, but also nearby, indirectly-affected regions. For example, an excise tax on sweetened beverages in Philadelphia was shown to not only be associated with a decrease in volume sales of taxed beverages in Philadelphia, but also an increase in sales in bordering counties not subject to the tax. The latter association may be explained by cross-border shop** behaviors of Philadelphia residents and indicate a causal effect of the tax on nearby regions, which may offset the total effect of the intervention. To estimate causal effects in this setting, we extend difference-in-differences methodology to account for such interference between regions and adjust for potential confounding present in quasi-experimental evaluations. Our doubly robust estimators for the average treatment effect on the treated and neighboring control relax standard assumptions on interference and model specification. We apply these methods to evaluate the change in volume sales of taxed beverages in 231 Philadelphia and bordering county stores due to the Philadelphia beverage tax. We also use our methods to explore the heterogeneity of effects across geographic features. △ Less

Submitted 1 February, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:2301.04412 [pdf, ps, other]

RobustIV and controlfunctionIV: Causal Inference for Linear and Nonlinear Models with Invalid Instrumental Variables

Authors: Taehyeon Koo, You** Lee, Dylan S. Small, Zijian Guo

Abstract: We present R software packages RobustIV and controlfunctionIV for causal inference with possibly invalid instrumental variables. RobustIV focuses on the linear outcome model. It implements the two-stage hard thresholding method to select valid instrumental variables from a set of candidate instrumental variables and make inferences for the causal effect in both low- and high-dimensional settings.… ▽ More We present R software packages RobustIV and controlfunctionIV for causal inference with possibly invalid instrumental variables. RobustIV focuses on the linear outcome model. It implements the two-stage hard thresholding method to select valid instrumental variables from a set of candidate instrumental variables and make inferences for the causal effect in both low- and high-dimensional settings. Furthermore, RobustIV implements the high-dimensional endogeneity test and the searching and sampling method, a uniformly valid inference method robust to errors in instrumental variable selection. controlfunctionIV considers the nonlinear outcome model and makes inferences about the causal effect based on the control function method. Our packages are demonstrated using two publicly available economic data sets together with applications to the Framingham Heart Study. △ Less

Submitted 20 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

arXiv:2301.00457 [pdf, other]

ReSQueing Parallel and Private Stochastic Convex Optimization

Authors: Yair Carmon, Arun Jambulapati, Yujia **, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $ε_{\text{opt}}$ with $d^{1/3}ε_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}ε_{\text{opt}}^{-2/3} + ε_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $ε_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d ε_{\text{dp}}^{-2}$, $(ε_{\text{dp}}, δ)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 ε_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime. △ Less

Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

arXiv:2301.00346 [pdf, other]

An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects

Authors: Thanh Vinh Vo, Arnab Bhattacharyya, Young Lee, Tze-Yun Leong

Abstract: We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have… ▽ More We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: NeurIPS 2022

arXiv:2212.01539 [pdf, other]

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clip**

Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

Abstract: Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clip**}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clip**}, where the… ▽ More Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clip**}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clip**}, where the gradient of each neural network layer is clipped separately, allows clip** to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clip** with constant thresholds tends to underperform standard flat clip**, per-layer clip** with adaptive thresholds matches or outperforms flat clip** under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clip** gradients that are distributed across multiple devices with \emph{per-device clip**} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clip** achieves a task performance at $ε=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 25 pages

arXiv:2210.10967 [pdf, other]

Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

Authors: Yong-Shiuan Lee

Abstract: Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or boot… ▽ More Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or bootstrapped MI data may not be worthy in terms of computation cost. To fast identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on MI data. The proposed methods proceed iteratively, which starts from finding the active variables based on the complete case subset and then expand the working data matrix with both the number of active variables and available observations. A comprehensive simulation study shows the selection accuracy in different aspects and computational efficiency of the proposed methods. Two real-life examples illustrate the strength of the proposed methods. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 34 pages, 9 figures

arXiv:2210.07219 [pdf, ps, other]

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

Authors: Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala

Abstract: We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of $e^{-f(x)}$ on a convex body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the form of $e^{-α^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of… ▽ More We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of $e^{-f(x)}$ on a convex body $\mathcal{M}\subset\mathbb{R}^{n}$. We show that for distributions in the form of $e^{-α^{\top}x}$ on a polytope with $m$ constraints, the convergence rate of a family of commonly-used integrators is independent of $\left\Vert α\right\Vert _{2}$ and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of $\widetilde{O}\left(mn^{3}\right)$ to achieve $ε$ total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form $e^{-f(x)}$ in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of [KLSV22], which shows that RHMC with IMM can sample ill-conditioned, non-smooth and constrained distributions in very high dimension efficiently in practice. △ Less

Submitted 10 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Improved writing & Theory for arXiv:2202.01908

arXiv:2209.10105 [pdf, ps, other]

Distributed Online Non-convex Optimization with Composite Regret

Authors: Zhanhong Jiang, Aditya Balu, Xian Yeow Lee, Young M. Lee, Chinmay Hegde, Soumik Sarkar

Abstract: Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex los… ▽ More Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 41 pages, presented in allerton conference 2022

arXiv:2208.03343 [pdf]

doi 10.1177/0272989X231178317

Value of Information Analysis for External Validation of Risk Prediction Models

Authors: Mohsen Sadatsafavi, Tae Yoon Lee, Laure Wynants, Andrew Vickers, Paul Gustafson

Abstract: Background: Before being used to inform patient care, a risk prediction model needs to be validated in a representative sample from the target population. The finite size of the validation sample entails that there is uncertainty with respect to estimates of model performance. We apply value-of-information methodology as a framework to quantify the consequence of such uncertainty in terms of NB. M… ▽ More Background: Before being used to inform patient care, a risk prediction model needs to be validated in a representative sample from the target population. The finite size of the validation sample entails that there is uncertainty with respect to estimates of model performance. We apply value-of-information methodology as a framework to quantify the consequence of such uncertainty in terms of NB. Methods: We define the Expected Value of Perfect Information (EVPI) for model validation as the expected loss in NB due to not confidently knowing which of the alternative decisions confers the highest NB at a given risk threshold. We propose methods for EVPI calculations based on Bayesian or ordinary bootstrap** of NBs, as well as an asymptotic approach supported by the central limit theorem. We conducted brief simulation studies to compare the performance of these methods, and used subsets of data from an international clinical trial for predicting mortality after myocardial infarction as a case study. Results: The three computation methods generated similar EVPI values in simulation studies. In the case study, at the pre-specified threshold of 0.02, the best decision with current information would be to use the model, with an expected incremental NB of 0.0020 over treating all. At this threshold, EVPI was 0.0005 (a relative EVPI of 25%). When scaled to the annual number of heart attacks in the US, this corresponds to a loss of 400 true positives, or extra 19,600 false positives (unnecessary treatments) per year, indicating the value of further model validation. As expected, the validation EVPI generally declined with larger samples. Conclusion: Value-of-information methods can be applied to the NB calculated during external validation of clinical prediction models to provide a decision-theoretic perspective to the consequences of uncertainty. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: 24 pages, 4,484 words, 1 table, 2 boxes, 5 figures

arXiv:2207.09891 [pdf, other]

Maximum Likelihood Imputation

Authors: Jeongseop Han, Youngjo Lee, Jae Kwang Kim

Abstract: Maximum likelihood (ML) estimation is widely used in statistics. The h-likelihood has been proposed as an extension of Fisher's likelihood to statistical models including unobserved latent variables of recent interest. Its advantage is that the joint maximization gives ML estimators (MLEs) of both fixed and random parameters with their standard error estimates. However, the current h-likelihood ap… ▽ More Maximum likelihood (ML) estimation is widely used in statistics. The h-likelihood has been proposed as an extension of Fisher's likelihood to statistical models including unobserved latent variables of recent interest. Its advantage is that the joint maximization gives ML estimators (MLEs) of both fixed and random parameters with their standard error estimates. However, the current h-likelihood approach does not allow MLEs of variance components as Henderson's joint likelihood does not in linear mixed models. In this paper, we show how to form the h-likelihood in order to facilitate joint maximization for MLEs of whole parameters. We also show the role of the Jacobian term which allows MLEs in the presence of unobserved latent variables. To obtain MLEs for fixed parameters, intractable integration is not necessary. As an illustration, we show one-shot ML imputation for missing data by treating them as realized but unobserved random parameters. We show that the h-likelihood bypasses the expectation step in the expectation-maximization (EM) algorithm and allows single ML imputation instead of multiple imputations. We also discuss the difference in predictions in random effects and missing data. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09871 [pdf, ps, other]

Enhanced Laplace Approximation

Authors: Jeongseop Han, Youngjo Lee

Abstract: The Laplace approximation (LA) has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators (MLEs) based on the LA are often biased for binary or spatial data, and the corresponding Hessian matrix underestimates the standard errors of these approximate MLEs. A higher-order approximation h… ▽ More The Laplace approximation (LA) has been proposed as a method for approximating the marginal likelihood of statistical models with latent variables. However, the approximate maximum likelihood estimators (MLEs) based on the LA are often biased for binary or spatial data, and the corresponding Hessian matrix underestimates the standard errors of these approximate MLEs. A higher-order approximation has been proposed; however, it cannot be applied to complicated models such as correlated random effects models and does not provide consistent variance estimators. In this paper, we propose an enhanced LA (ELA) that provides the true MLE and its consistent variance estimator. We study its relationship to the variational Bayes method. We also introduce a new restricted maximum likelihood estimator (REMLE) for estimating dispersion parameters. The results of numerical studies show that the ELA provides a satisfactory MLE and REMLE, as well as their variance estimators for fixed parameters. The MLE and REMLE can be viewed as posterior mode and marginal posterior mode under flat priors, respectively. Some comparisons are also made with Bayesian procedures under different priors. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.08347 [pdf, ps, other]

Private Convex Optimization in General Norms

Authors: Sivakanth Gopi, Yin Tat Lee, Daogao Liu, Ruoqi Shen, Kevin Tian

Abstract: We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work o… ▽ More We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\|\cdot\|$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+μr))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\|\cdot\|$, generalizing a recent work of [Gopi, Lee, Liu '22] to non-Euclidean settings. We show that this mechanism satisfies Gaussian differential privacy and solves both DP-ERM (empirical risk minimization) and DP-SCO (stochastic convex optimization) by using localization tools from convex geometry. Our framework is the first to apply to private convex optimization in general normed spaces and directly recovers non-private SCO rates achieved by mirror descent as the privacy parameter $ε\to \infty$. As applications, for Lipschitz optimization in $\ell_p$ norms for all $p \in (1, 2)$, we obtain the first optimal privacy-utility tradeoffs; for $p = 1$, we improve tradeoffs obtained by the recent works [Asi, Feldman, Koren, Talwar '21, Bassily, Guzman, Nandi '21] by at least a logarithmic factor. Our $\ell_p$ norm and Schatten-$p$ norm optimization frameworks are complemented with polynomial-time samplers whose query complexity we explicitly bound. △ Less

Submitted 10 November, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: SODA 2023

arXiv:2207.00160 [pdf, other]

When Does Differentially Private Learning Not Suffer in High Dimensions?

Authors: Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

Abstract: Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following researc… ▽ More Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning. Code to reproduce our results can be found at \url{https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis}. △ Less

Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

Comments: 26 pages; v3 includes additional experiments and clarification

arXiv:2206.12663 [pdf, other]

Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert

Authors: Yoonhyung Lee, Sungdong Lee, Joong-Ho Won

Abstract: The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model paramet… ▽ More The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature. △ Less

Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

Comments: Accepted to the 39 th International Conference on Machine Learning. This version contains corrections to typos found after submitting the camera-ready version

arXiv:2206.02032 [pdf, other]

A Neural Network Approach for Homogenization of Multiscale Problems

Authors: Jihun Han, Yoonsang Lee

Abstract: We propose a neural network-based approach to the homogenization of multiscale problems. The proposed method uses a derivative-free formulation of a training loss, which incorporates Brownian walkers to find the macroscopic description of a multiscale PDE solution. Compared with other network-based approaches for multiscale problems, the proposed method is free from the design of hand-crafted neur… ▽ More We propose a neural network-based approach to the homogenization of multiscale problems. The proposed method uses a derivative-free formulation of a training loss, which incorporates Brownian walkers to find the macroscopic description of a multiscale PDE solution. Compared with other network-based approaches for multiscale problems, the proposed method is free from the design of hand-crafted neural network architecture and the cell problem to calculate the homogenization coefficient. The exploration neighborhood of the Brownian walkers affects the overall learning trajectory. We determine the bounds of micro- and macro-time steps that capture the local heterogeneous and global homogeneous solution behaviors, respectively, through a neural network. The bounds imply that the computational cost of the proposed method is independent of the microscale periodic structure for the standard periodic problems. We validate the efficiency and robustness of the proposed method through a suite of linear and nonlinear multiscale problems with periodic and random field coefficients. △ Less

Submitted 4 June, 2022; originally announced June 2022.

Comments: 20 pages, 6 figures

MSC Class: 65N99; 65C05; 68T07

arXiv:2205.06364 [pdf]

doi 10.1177/0272989X231171166

Closed-Form Solution of the Unit Normal Loss Integral in Two-Dimensions

Authors: Tae Yoon Lee, Paul Gustafson, Mohsen Sadatsafavi

Abstract: In Value of Information (VoI) analysis, the unit normal loss integral (UNLI) frequently emerges as a solution for the computation of various VoI metrics. However, one limitation of the UNLI has been that its closed-form solution is available for only one dimension, and thus can be used for comparisons involving only two strategies (where it is applied to the scalar incremental net benefit). We der… ▽ More In Value of Information (VoI) analysis, the unit normal loss integral (UNLI) frequently emerges as a solution for the computation of various VoI metrics. However, one limitation of the UNLI has been that its closed-form solution is available for only one dimension, and thus can be used for comparisons involving only two strategies (where it is applied to the scalar incremental net benefit). We derived a closed-form solution for the two-dimensional UNLI, enabling closed-form VoI calculations for three strategies. We verified the accuracy of this method via simulation studies. A case study based on a three-arm clinical trial was used as an example. VoI methods based on the closed-form solutions for the UNLI can now be extended to three-decision comparisons, taking a fraction of a second to compute and not being subject to Monte Carlo error. An R implementation of this method is provided as part of the predtools package (https://github.com/resplab/predtools/). △ Less

Submitted 23 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: 1 table, 1 figure, will be submitted to MDM - technical note

arXiv:2202.03418 [pdf, other]

Diversify and Disambiguate: Learning From Underspecified Data

Authors: Yoonho Lee, Huaxiu Yao, Chelsea Finn

Abstract: Many datasets are underspecified: there exist multiple equally viable solutions to a given task. Underspecification can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus produce widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework… ▽ More Many datasets are underspecified: there exist multiple equally viable solutions to a given task. Underspecification can be problematic for methods that learn a single hypothesis because different functions that achieve low training loss can focus on different predictive features and thus produce widely varying predictions on out-of-distribution data. We propose DivDis, a simple two-stage framework that first learns a diverse collection of hypotheses for a task by leveraging unlabeled data from the test distribution. We then disambiguate by selecting one of the discovered hypotheses using minimal additional supervision, in the form of additional labels or inspection of function visualization. We demonstrate the ability of DivDis to find hypotheses that use robust features in image classification and natural language processing problems with underspecification. △ Less

Submitted 21 February, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: ICLR 2023. Code is available at https://github.com/yoonholee/DivDis

arXiv:2201.12430 [pdf, other]

Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications

Authors: Se Yoon Lee

Abstract: Nonlinear mixed effects models have become a standard platform for analysis when data is in the form of continuous and repeated measurements of subjects from a population of interest, while temporal profiles of subjects commonly follow a nonlinear tendency. While frequentist analysis of nonlinear mixed effects models has a long history, Bayesian analysis of the models has received comparatively li… ▽ More Nonlinear mixed effects models have become a standard platform for analysis when data is in the form of continuous and repeated measurements of subjects from a population of interest, while temporal profiles of subjects commonly follow a nonlinear tendency. While frequentist analysis of nonlinear mixed effects models has a long history, Bayesian analysis of the models has received comparatively little attention until the late 1980s due primarily to the time-consuming nature of Bayesian computation. Since the early 1990s Bayesian approaches for the models began to emerge to leverage rapid developments in computing power, and recently, have received significant attention due to (1) superiority to quantify the uncertainty of parameter estimation; (2) utility to incorporate prior knowledge into the models; and (3) flexibility to match exactly the increasing complexity of scientific research arising from diverse industrial and academic fields. This review article presents an overview of modeling strategies to implement Bayesian approaches for the nonlinear mixed effects models, ranging from designing a scientific question out of real-life problems to practical computations. △ Less

Submitted 2 March, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2110.15012 [pdf, ps, other]

On rereading Savage

Authors: Yudi Pawitan, Youngjo Lee

Abstract: If we accept Savage's set of axioms, then all uncertainties must be treated like ordinary probability. Savage espoused subjective probability, allowing, for example, the probability of Donald Trump's re-election. But Savage's probability also covers the objective version, such as the probability of heads in a fair toss of a coin. In other words, there is no distinction between objective and subjec… ▽ More If we accept Savage's set of axioms, then all uncertainties must be treated like ordinary probability. Savage espoused subjective probability, allowing, for example, the probability of Donald Trump's re-election. But Savage's probability also covers the objective version, such as the probability of heads in a fair toss of a coin. In other words, there is no distinction between objective and subjective probability. Savage's system has great theoretical implications; for example, prior probabilities can be elicited from subjective preferences, and then get updated by objective evidence, a learning step that forms the basis of Bayesian computations. Non-Bayesians have generally refused to accept the subjective aspect of probability or to allow priors in formal statistical modelling. As demanded, for example, by the late Dennis Lindley, since Bayesian probability is axiomatic, it is the non-Bayesians' duty to point out which axioms are not acceptable to them. This is not a simple request, since the Bayesian axioms are not commonly covered in our professional training, even in the Bayesian statistics courses. So our aim is to provide a readable exposition the Bayesian axioms from a close rereading Savage's classic book. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: 23 pages

arXiv:2110.13812 [pdf, other]

Day-ahead Forecasts of Air Temperature

Authors: Hewei Wang, Muhammad Salman Pathan, Yee Hui Lee, Soumyabrata Dev

Abstract: Air temperature is an essential factor that directly impacts the weather. Temperature can be counted as an important sign of climatic change, that profoundly impacts our health, development, and urban planning. Therefore, it is vital to design a framework that can accurately predict the temperature values for considerable lead times. In this paper, we propose a technique based on exponential smoot… ▽ More Air temperature is an essential factor that directly impacts the weather. Temperature can be counted as an important sign of climatic change, that profoundly impacts our health, development, and urban planning. Therefore, it is vital to design a framework that can accurately predict the temperature values for considerable lead times. In this paper, we propose a technique based on exponential smoothing method to accurately predict temperature using historical values. Our proposed method shows good performance in capturing the seasonal variability of temperature. We report a root mean square error of $4.62$ K for a lead time of $3$ days, using daily averages of air temperature data. Our case study is based on weather stations located in the city of Alpena, Michigan, United States. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted in Proc. IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2021

arXiv:2110.07531 [pdf]

Deep learning models for predicting RNA degradation via dual crowdsourcing

Authors: Hannah K. Wayment-Steele, Wipapat Kladwang, Andrew M. Watkins, Do Soon Kim, Bojan Tunguz, Walter Reade, Maggie Demkin, Jonathan Romano, Roger Wellington-Oguri, John J. Nicol, Jiayang Gao, Kazuki Onodera, Kazuki Fujikawa, Hanfei Mao, Gilles Vandewiele, Michele Tinti, Bram Steenwinckel, Takuya Ito, Taiga Noumi, Shujun He, Keiichiro Ishi, Youhan Lee, Fatih Öztürk, Anthony Chiu, Emin Öztürk , et al. (4 additional authors not shown)

Abstract: Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke… ▽ More Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales. △ Less

Submitted 22 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2110.06500 [pdf, other]

Differentially Private Fine-tuning of Language Models

Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $ε= 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $ε= 6.8,δ=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced. △ Less

Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

arXiv:2106.14415 [pdf, ps, other]

Exact simulation of extrinsic stress-release processes

Authors: Young Lee, Patrick J. Laub, Thomas Taimre, Hongbiao Zhao, Jiancang Zhuang

Abstract: We present a new and straightforward algorithm that simulates exact sample paths for a generalized stress-release process. The computation of the exact law of the joint interarrival times is detailed and used to derive this algorithm. Furthermore, the martingale generator of the process is derived and induces theoretical moments which generalize some results of Borovkov & Vere-Jones (2000) and are… ▽ More We present a new and straightforward algorithm that simulates exact sample paths for a generalized stress-release process. The computation of the exact law of the joint interarrival times is detailed and used to derive this algorithm. Furthermore, the martingale generator of the process is derived and induces theoretical moments which generalize some results of Borovkov & Vere-Jones (2000) and are used to demonstrate the validity of our simulation algorithm. △ Less

Submitted 28 June, 2021; originally announced June 2021.

MSC Class: 60G20 (Primary) 60G55; 65C05 (Secondary)

arXiv:2106.14258 [pdf, other]

Sparse Logistic Tensor Decomposition for Binary Data

Authors: Jianhao Zhang, Yoonkyung Lee

Abstract: Tensor data are increasingly available in many application domains. We develop several tensor decomposition methods for binary tensor data. Different from classical tensor decompositions for continuous-valued data with squared error loss, we formulate logistic tensor decompositions for binary data with a Bernoulli likelihood. To enhance the interpretability of estimated factors and improve their s… ▽ More Tensor data are increasingly available in many application domains. We develop several tensor decomposition methods for binary tensor data. Different from classical tensor decompositions for continuous-valued data with squared error loss, we formulate logistic tensor decompositions for binary data with a Bernoulli likelihood. To enhance the interpretability of estimated factors and improve their stability further, we propose sparse formulations of logistic tensor decomposition by considering $\ell_{1}$-norm and $\ell_{0}$-norm regularized likelihood. To handle the resulting optimization problems, we develop computational algorithms which combine the strengths of tensor power method and majorization-minimization (MM) algorithm. Through simulation studies, we demonstrate the utility of our methods in analysis of binary tensor data. To illustrate the effectiveness of the proposed methods, we analyze a dataset concerning nations and their political relations and perform co-clustering of estimated factors to find associations between the nations and political relations. △ Less

Submitted 27 June, 2021; originally announced June 2021.

arXiv:2106.10721 [pdf]

doi 10.1177/0272989X221078789

Uncertainty and Value of Information in Risk Prediction Modeling

Authors: Mohsen Sadatsafavi, Tae Yoon Lee, Paul Gustafson

Abstract: Background: Due to the finite size of the development sample, predicted probabilities from a risk prediction model are inevitably uncertain. We apply Value of Information methodology to evaluate the decision-theoretic implications of prediction uncertainty. Methods: Adopting a Bayesian perspective, we extend the definition of the Expected Value of Perfect Information (EVPI) from decision analysi… ▽ More Background: Due to the finite size of the development sample, predicted probabilities from a risk prediction model are inevitably uncertain. We apply Value of Information methodology to evaluate the decision-theoretic implications of prediction uncertainty. Methods: Adopting a Bayesian perspective, we extend the definition of the Expected Value of Perfect Information (EVPI) from decision analysis to net benefit calculations in risk prediction. In the context of model development, EVPI is the expected gain in net benefit by using the correct predictions as opposed to predictions from a proposed model. We suggest bootstrap methods for sampling from the posterior distribution of predictions for EVPI calculation using Monte Carlo simulations. In a case study, we used subsets of data of various sizes from a clinical trial for predicting mortality after myocardial infarction to show how EVPI changes with sample size. Results: With a sample size of 1,000 and at the pre-specified threshold of 2% on predicted risks, the gain in net benefit by using the proposed and the correct models were 0.0006 and 0.0011, respectively, resulting in an EVPI of 0.0005 and a relative EVPI of 87%. EVPI was zero only at unrealistically high thresholds (>85%). As expected, EVPI declined with larger samples. We summarize an algorithm for incorporating EVPI calculations into the commonly used bootstrap method for optimism correction. Conclusion: Value of Information methods can be applied to explore decision-theoretic consequences of uncertainty in risk prediction and can complement inferential methods when develo** risk prediction models. R code for implementing this method is provided. △ Less

Submitted 3 November, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

Comments: 24 pages, 1 table, 3 figures

Showing 1–50 of 156 results for author: Lee, Y