-
Differentially Private Boxplots
Authors:
Kelly Ramsay,
Jairo Diaz-Rodriguez
Abstract:
Despite the potential of differentially private data visualization to harmonize data analysis and privacy, research in this area remains relatively underdeveloped. Boxplots are a widely popular visualization used for summarizing a dataset and for comparison of multiple datasets. Consequentially, we introduce a differentially private boxplot. We evaluate its effectiveness for displaying location, s…
▽ More
Despite the potential of differentially private data visualization to harmonize data analysis and privacy, research in this area remains relatively underdeveloped. Boxplots are a widely popular visualization used for summarizing a dataset and for comparison of multiple datasets. Consequentially, we introduce a differentially private boxplot. We evaluate its effectiveness for displaying location, scale, skewness and tails of a given empirical distribution. In our theoretical exposition, we show that the location and scale of the boxplot are estimated with optimal sample complexity, and the skewness and tails are estimated consistently. In simulations, we show that this boxplot performs similarly to a non-private boxplot, and it outperforms a boxplot naively constructed from existing differentially private quantile algorithms. Additionally, we conduct a real data analysis of Airbnb listings, which shows that comparable analysis can be achieved through differentially private boxplot visualization.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Tomographic reconstruction of a disease transmission landscape via GPS recorded random paths
Authors:
Jairo Diaz-Rodriguez,
Juan Pablo Gomez,
Jeremy P. Orange,
Nathan D. Burkett-Cadena,
Samantha M. Wisely,
Jason K. Blackburn,
Sylvain Sardy
Abstract:
Identifying areas in a landscape where individuals have higher probability of becoming infected with a pathogen is a crucial step towards disease management. We perform a novel epidemiological tomography for the estimation of landscape propensity to disease infection, using GPS animal tracks in a manner analogous to tomographic techniques in Positron Emission Tomography. Our study data consists of…
▽ More
Identifying areas in a landscape where individuals have higher probability of becoming infected with a pathogen is a crucial step towards disease management. We perform a novel epidemiological tomography for the estimation of landscape propensity to disease infection, using GPS animal tracks in a manner analogous to tomographic techniques in Positron Emission Tomography. Our study data consists of individual tracks of white-tailed deer (Odocoileus virginianus) and three exotic Cervidae species moving freely in a high-fenced game preserve over given time periods. A serological test was performed on each individual to measure antibody concentration of epizootic hemorrhagic disease viruses (EHDV) at the beginning and at the end of each tracking period. EHDV is a vector-borne viral disease indirectly transmitted between ruminant hosts by biting midges. We model the data as a binomial linear inverse problem, where spatial coherence is enforced with a total variation regularization. The smoothness of the reconstructed propensity map is selected by the quantile universal threshold, which can also test the null hypothesis that the propensity map is spatially constant. We apply our method to simulated and real data, showing good statistical properties during simulations and consistent results and interpretations compared to intensive field estimations.
△ Less
Submitted 30 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Adjoint-Free 4D-Var Methods Via Line Search Optimization For Non-Linear Data Assimilation
Authors:
Elias Nin-Ruiz,
Jairo Diaz-Rodriguez
Abstract:
This paper proposes two practical implementations of Four-Dimensional Variational (4D-Var) Ensemble Kalman Filter (4D-EnKF) methods for non-linear data assimilation. Our formulations' main idea is to avoid the intrinsic need for adjoint models in the context of 4D-Var optimization and, even more, to handle non-linear observation operators during the assimilation of observations. The proposed metho…
▽ More
This paper proposes two practical implementations of Four-Dimensional Variational (4D-Var) Ensemble Kalman Filter (4D-EnKF) methods for non-linear data assimilation. Our formulations' main idea is to avoid the intrinsic need for adjoint models in the context of 4D-Var optimization and, even more, to handle non-linear observation operators during the assimilation of observations. The proposed methods work as follows: snapshots of an ensemble of model realizations are taken at observation times, these snapshots are employed to build control spaces onto which analysis increments can be estimated. Via the linearization of observation operators at observation times, a line-search based optimization method is proposed to estimate optimal analysis increments. The convergence of this method is theoretically proven as long as the dimension of control-spaces equals model one. In the first formulation, control spaces are given by full-rank square root approximations of background error covariance matrices via the Bickel and Levina precision matrix estimator. In this context, we propose an iterative Woodbury matrix formula to perform the optimization steps efficiently. The last formulation can be considered as an extension of the Maximum Likelihood Ensemble Filter to the 4D-Var context. This employs pseudo-square root approximations of prior error covariance matrices to build control spaces. Experimental tests are performed by using the Lorenz 96 model. The results reveal that, in terms of Root-Mean-Square-Error values, both methods can obtain reasonable estimates of posterior error modes in the 4D-Var optimization problem. Moreover, the accuracies of the proposed filter implementations can be improved as the ensemble sizes are increased.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Thresholding tests
Authors:
Sylvain Sardy,
Caroline Giacobino,
Jairo Diaz-Rodriguez
Abstract:
We derive a new class of statistical tests for generalized linear models based on thresholding point estimators. These tests can be employed whether the model includes more parameters than observations or not. For linear models, our tests rely on pivotal statistics derived from model selection techniques. Affine lasso, a new extension of lasso, allows to unveil new tests and to develop in the same…
▽ More
We derive a new class of statistical tests for generalized linear models based on thresholding point estimators. These tests can be employed whether the model includes more parameters than observations or not. For linear models, our tests rely on pivotal statistics derived from model selection techniques. Affine lasso, a new extension of lasso, allows to unveil new tests and to develop in the same framework parametric and nonparametric tests. Our tests for generalized linear models are based on new asymptotically pivotal statistics. A composite thresholding test attempts to achieve uniformly most power under both sparse and dense alternatives with success. In a simulation, we compare the level and power of these tests under sparse and dense alternative hypotheses. The thresholding tests have a better control of the nominal level and higher power than existing tests.
△ Less
Submitted 13 March, 2018; v1 submitted 9 August, 2017;
originally announced August 2017.
-
Nonparametric estimation of galaxy cluster's emissivity and point source detection in astrophysics with two lasso penalties
Authors:
Jairo Diaz-Rodriguez,
Dominique Eckert,
Hatef Monajemi,
Stéphane Paltani,
Sylvain Sardy
Abstract:
Astrophysicists are interested in recovering the 3D gas emissivity of a galaxy cluster from a 2D image taken by a telescope. A blurring phenomenon and presence of point sources make this inverse problem even harder to solve. The current state-of-the-art technique is two step: first identify the location of potential point sources, then mask these locations and deproject the data.
We instead mode…
▽ More
Astrophysicists are interested in recovering the 3D gas emissivity of a galaxy cluster from a 2D image taken by a telescope. A blurring phenomenon and presence of point sources make this inverse problem even harder to solve. The current state-of-the-art technique is two step: first identify the location of potential point sources, then mask these locations and deproject the data.
We instead model the data as a Poisson generalized linear model (involving blurring, Abel and wavelets operators) regularized by two lasso penalties to induce sparse wavelet representation and sparse point sources. The amount of sparsity is controlled by two quantile universal thresholds. As a result, our method outperforms the existing one.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
Quantile universal threshold for model selection
Authors:
Caroline Giacobino,
Sylvain Sardy,
Jairo Diaz-Rodriguez,
Nick Hengartner
Abstract:
Efficient recovery of a low-dimensional structure from high-dimensional data has been pursued in various settings including wavelet denoising, generalized linear models and low-rank matrix estimation. By thresholding some parameters to zero, estimators such as lasso, elastic net and subset selection allow to perform not only parameter estimation but also variable selection, leading to sparsity. Ye…
▽ More
Efficient recovery of a low-dimensional structure from high-dimensional data has been pursued in various settings including wavelet denoising, generalized linear models and low-rank matrix estimation. By thresholding some parameters to zero, estimators such as lasso, elastic net and subset selection allow to perform not only parameter estimation but also variable selection, leading to sparsity. Yet one crucial step challenges all these estimators: the choice of the threshold parameter~$λ$. If too large, important features are missing; if too small, incorrect features are included.
Within a unified framework, we propose a new selection of $λ$ at the detection edge under the null model. To that aim, we introduce the concept of a zero-thresholding function and a null-thresholding statistic, that we explicitly derive for a large class of estimators. The new approach has the great advantage of transforming the selection of $λ$ from an unknown scale to a probabilistic scale with the simple selection of a probability level. Numerical results show the effectiveness of our approach in terms of model selection and prediction.
△ Less
Submitted 20 March, 2017; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Quantile universal threshold: model selection at the detection edge for high-dimensional linear regression
Authors:
Jairo Diaz-Rodriguez,
Sylvain Sardy
Abstract:
To estimate a sparse linear model from data with Gaussian noise, consilience from lasso and compressed sensing literatures is that thresholding estimators like lasso and the Dantzig selector have the ability in some situations to identify with high probability part of the significant covariates asymptotically, and are numerically tractable thanks to convexity.
Yet, the selection of a threshold p…
▽ More
To estimate a sparse linear model from data with Gaussian noise, consilience from lasso and compressed sensing literatures is that thresholding estimators like lasso and the Dantzig selector have the ability in some situations to identify with high probability part of the significant covariates asymptotically, and are numerically tractable thanks to convexity.
Yet, the selection of a threshold parameter $λ$ remains crucial in practice. To that aim we propose Quantile Universal Thresholding, a selection of $λ$ at the detection edge. We show with extensive simulations and real data that an excellent compromise between high true positive rate and low false discovery rate is achieved, leading also to good predictive risk.
△ Less
Submitted 5 December, 2014;
originally announced December 2014.