The 2010 Census Confidentiality Protections Failed, Here's How and Why
Authors:
John M. Abowd,
Tamara Adams,
Robert Ashmead,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Nathan Goldschlag,
Daniel Kifer,
Philip Leclerc,
Ethan Lew,
Scott Moore,
Rolando A. RodrÃguez,
Ramy N. Tadros,
Lars Vilhuber
Abstract:
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can veri…
▽ More
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swap**) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
Learning High-Dimensional Nonparametric Differential Equations via Multivariate Occupation Kernel Functions
Authors:
Victor Rielly,
Kamel Lahouel,
Ethan Lew,
Michael Wells,
Vicky Haney,
Bruno Jedynak
Abstract:
Learning a nonparametric system of ordinary differential equations (ODEs) from $n$ trajectory snapshots in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach to learning using the…
▽ More
Learning a nonparametric system of ordinary differential equations (ODEs) from $n$ trajectory snapshots in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach to learning using the implicit formulation provided by vector-valued Reproducing Kernel Hilbert Spaces. By rewriting the ODEs in a weaker integral form, which we subsequently minimize, we derive our learning algorithm. The minimization problem's solution for the vector field relies on multivariate occupation kernel functions associated with the solution trajectories. We validate our approach through experiments on highly nonlinear simulated and real data, where $d$ may exceed 100. We further demonstrate the versatility of the proposed method by learning a nonparametric first order quasilinear partial differential equation.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
Learning nonparametric ordinary differential equations from noisy data
Authors:
Kamel Lahouel,
Michael Wells,
Victor Rielly,
Ethan Lew,
David Lovitz,
Bruno M. Jedynak
Abstract:
Learning nonparametric systems of Ordinary Differential Equations (ODEs) dot x = f(t,x) from noisy data is an emerging machine learning topic. We use the well-developed theory of Reproducing Kernel Hilbert Spaces (RKHS) to define candidates for f for which the solution of the ODE exists and is unique. Learning f consists of solving a constrained optimization problem in an RKHS. We propose a penalt…
▽ More
Learning nonparametric systems of Ordinary Differential Equations (ODEs) dot x = f(t,x) from noisy data is an emerging machine learning topic. We use the well-developed theory of Reproducing Kernel Hilbert Spaces (RKHS) to define candidates for f for which the solution of the ODE exists and is unique. Learning f consists of solving a constrained optimization problem in an RKHS. We propose a penalty method that iteratively uses the Representer theorem and Euler approximations to provide a numerical solution. We prove a generalization bound for the L2 distance between x and its estimator and provide experimental comparisons with the state-of-the-art.
△ Less
Submitted 12 November, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.