Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains
Authors:
Steven Wilkins-Reeves,
Xu Chen,
Qi Ma,
Christine Agarwal,
Aude Hofleitner
Abstract:
Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each seg…
▽ More
Distribution shifts are ubiquitous in real-world machine learning applications, posing a challenge to the generalization of models trained on one data distribution to another. We focus on scenarios where data distributions vary across multiple segments of the entire population and only make local assumptions about the differences between training and test (deployment) distributions within each segment. We propose a two-stage multiply robust estimation method to improve model performance on each individual segment for tabular data analysis. The method involves fitting a linear combination of the based models, learned using clusters of training data from multiple segments, followed by a refinement step for each segment. Our method is designed to be implemented with commonly used off-the-shelf machine learning models. We establish theoretical guarantees on the generalization bound of the method on the test risk. With extensive experiments on synthetic and real datasets, we demonstrate that the proposed method substantially improves over existing alternatives in prediction accuracy and robustness on both regression and classification tasks. We also assess its effectiveness on a user city prediction dataset from Meta.
△ Less
Submitted 3 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
Asymptotically Normal Estimation of Local Latent Network Curvature
Authors:
Steven Wilkins-Reeves,
Tyler McCormick
Abstract:
Network data, commonly used throughout the physical, social, and biological sciences, consists of nodes (individuals) and the edges (interactions) between them. One way to represent network data's complex, high-dimensional structure is to embed the graph into a low-dimensional geometric space. The curvature of this space, in particular, provides insights about the structure in the graph, such as t…
▽ More
Network data, commonly used throughout the physical, social, and biological sciences, consists of nodes (individuals) and the edges (interactions) between them. One way to represent network data's complex, high-dimensional structure is to embed the graph into a low-dimensional geometric space. The curvature of this space, in particular, provides insights about the structure in the graph, such as the propensity to form triangles or present tree-like structures. We derive an estimating function for curvature based on triangle side lengths and the length of the midpoint of a side to the opposing corner. We construct an estimator where the only input is a distance matrix and also establish asymptotic normality. We next introduce a novel latent distance matrix estimator for networks and an efficient algorithm to compute the estimate via solving iterative quadratic programs. We apply this method to the Los Alamos National Laboratory Unified Network and Host dataset and show how curvature estimates can be used to detect a red-team attack faster than naive methods, as well as discover non-constant latent curvature in co-authorship networks in physics. The code for this paper is available at https://github.com/SteveJWR/netcurve, and the methods are implemented in the R package https://github.com/SteveJWR/lolaR.
△ Less
Submitted 8 June, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.