-
Sparse Fréchet Sufficient Dimension Reduction with Graphical Structure Among Predictors
Authors:
Jiaying Weng,
Kai Tan,
Cheng Wang,
Zhou Yu
Abstract:
Fréchet regression has received considerable attention to model metric-space valued responses that are complex and non-Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fréchet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fréchet suffi…
▽ More
Fréchet regression has received considerable attention to model metric-space valued responses that are complex and non-Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fréchet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fréchet sufficient dimension reduction with graphical structure among high-dimensional Euclidean predictors. In particular, we propose a convex optimization problem that leverages the graphical information among predictors and avoids inverting the high-dimensional covariance matrix. We also provide the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the optimization problem. Theoretically, the proposed method achieves subspace estimation and variable selection consistency under suitable conditions. Extensive simulations and a real data analysis are carried out to illustrate the finite-sample performance of the proposed method.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Prediction Model For Wordle Game Results With High Robustness
Authors:
Jiaqi Weng,
Chunlin Feng
Abstract:
In this study, we delve into the dynamics of Wordle using data analysis and machine learning. Our analysis initially focused on the correlation between the date and the number of submitted results. Due to initial popularity bias, we modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2, and weekdays/weekends as the exogenous variable. We found no significant relationship bet…
▽ More
In this study, we delve into the dynamics of Wordle using data analysis and machine learning. Our analysis initially focused on the correlation between the date and the number of submitted results. Due to initial popularity bias, we modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2, and weekdays/weekends as the exogenous variable. We found no significant relationship between word attributes and hard mode results.
To predict word difficulty, we employed a Backpropagation Neural Network, overcoming overfitting via feature engineering. We also used K-means clustering, optimized at five clusters, to categorize word difficulty numerically. Our findings indicate that on March 1st, 2023, around 12,884 results will be submitted and the word "eerie" averages 4.8 attempts, falling into the hardest difficulty cluster.
We further examined the percentage of loyal players and their propensity to undertake daily challenges. Our models underwent rigorous sensitivity analyses, including ADF, ACF, PACF tests, and cross-validation, confirming their robustness. Overall, our study provides a predictive framework for Wordle gameplay based on date or a given five-letter word. Results have been summarized and submitted to the Puzzle Editor of the New York Times.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
itdr: An R package of Integral Transformation Methods to Estimate the SDR Subspaces in Regression
Authors:
Tharindu P. De Alwis,
S. Yaser Samadi,
Jiaying Weng
Abstract:
Sufficient dimension reduction (SDR) is an effective tool for regression models, offering a viable approach to address and analyze the nonlinear nature of regression problems. This paper introduces the itdr R package, a comprehensive and user-friendly tool that introduces several functions based on integral transformation methods for estimating SDR subspaces. In particular, the itdr package incorp…
▽ More
Sufficient dimension reduction (SDR) is an effective tool for regression models, offering a viable approach to address and analyze the nonlinear nature of regression problems. This paper introduces the itdr R package, a comprehensive and user-friendly tool that introduces several functions based on integral transformation methods for estimating SDR subspaces. In particular, the itdr package incorporates two key methods, namely the Fourier method (FM) and the convolution method (CM). These methods allow for estimating the SDR subspaces, namely the central mean subspace (CMS) and the central subspace (CS), in cases where the response is univariate. Furthermore, the itdr package facilitates the recovery of the CMS through the iterative Hessian transformation (IHT) method for univariate responses. Additionally, it enables the recovery of the CS by employing various Fourier transformation strategies, such as the inverse dimension reduction method, the minimum discrepancy approach using Fourier transformation, and the Fourier transform sparse inverse regression approach, specifically designed for cases with multivariate responses. To demonstrate its capabilities, the itdr package is applied to five different datasets. Furthermore, this package is the pioneering implementation of integral transformation methods for estimating SDR subspaces, thus promising significant advancements in SDR research.
△ Less
Submitted 16 July, 2023; v1 submitted 13 April, 2022;
originally announced April 2022.
-
RDP-GAN: A Rényi-Differential Privacy based Generative Adversarial Network
Authors:
Chuan Ma,
Jun Li,
Ming Ding,
Bo Liu,
Kang Wei,
Jian Weng,
H. Vincent Poor
Abstract:
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection. Without directly interactive with training examples, the generative model can be fully used to estimate the underlying distribution of an original dataset while the discriminative model can examine the quality of the generated…
▽ More
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection. Without directly interactive with training examples, the generative model can be fully used to estimate the underlying distribution of an original dataset while the discriminative model can examine the quality of the generated samples by comparing the label values with the training examples. However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information. To mitigate this information leakage and construct a private GAN, in this work we propose a Rényi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training. Moreover, we derive the analytical results of the total privacy loss under the subsampling method and cumulated iterations, which show its effectiveness on the privacy budget allocation. In addition, in order to mitigate the negative impact brought by the injecting noise, we enhance the proposed algorithm by adding an adaptive noise tuning step, which will change the volume of added noise according to the testing accuracy. Through extensive experimental results, we verify that the proposed algorithm can achieve a better privacy level while producing high-quality samples compared with a benchmark DP-GAN scheme based on noise perturbation on training gradients.
△ Less
Submitted 4 July, 2020;
originally announced July 2020.