-
A Graphical Comparison of Screening Designs using Support Recovery Probabilities
Authors:
Kade Young,
Maria L. Weese,
Jonathan W. Stallrich,
Byran J. Smucker,
David J. Edwards
Abstract:
A screening experiment attempts to identify a subset of important effects using a relatively small number of experimental runs. Given the limited run size and a large number of possible effects, penalized regression is a popular tool used to analyze screening designs. In particular, an automated implementation of the Gauss-Dantzig selector has been widely recommended to compare screening design co…
▽ More
A screening experiment attempts to identify a subset of important effects using a relatively small number of experimental runs. Given the limited run size and a large number of possible effects, penalized regression is a popular tool used to analyze screening designs. In particular, an automated implementation of the Gauss-Dantzig selector has been widely recommended to compare screening design construction methods. Here, we illustrate potential reproducibility issues that arise when comparing screening designs via simulation, and recommend a graphical method, based on screening probabilities, which compares designs by evaluating them along the penalized regression solution path. This method can be implemented using simulation, or, in the case of lasso, by using exact local lasso sign recovery probabilities. Our approach circumvents the need to specify tuning parameters associated with regularization methods, leading to more reliable design comparisons. This article contains supplementary materials including code to implement the proposed methods.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Boundary Peeling: Outlier Detection Method Using One-Class Peeling
Authors:
Sheikh Arafat,
Na Sun,
Maria L. Weese,
Waldyn G. Martinez
Abstract:
Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and perform consistently well across diverse underlying data distributions. We introduce One-Class Boundary Peeling, an unsupervised outlier detection algorithm. One-cla…
▽ More
Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and perform consistently well across diverse underlying data distributions. We introduce One-Class Boundary Peeling, an unsupervised outlier detection algorithm. One-class Boundary Peeling uses the average signed distance from iteratively-peeled, flexible boundaries generated by one-class support vector machines. One-class Boundary Peeling has robust hyperparameter settings and, for increased flexibility, can be cast as an ensemble method. In synthetic data simulations One-Class Boundary Peeling outperforms all state of the art methods when no outliers are present while maintaining comparable or superior performance in the presence of outliers, as compared to benchmark methods. One-Class Boundary Peeling performs competitively in terms of correct classification, AUC, and processing time using common benchmark data sets.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
An Optimal Design Framework for Lasso Sign Recovery
Authors:
Jonathan W. Stallrich,
Kade Young,
Maria L. Weese,
Byran J. Smucker,
David J. Edwards
Abstract:
Supersaturated designs investigate more factors than there are runs, and are often constructed under a criterion measuring a design's proximity to an unattainable orthogonal design. The most popular analysis identifies active factors by inspecting the solution path of a penalized estimator, such as the lasso. Recent criteria encouraging positive correlations between factors have been shown to prod…
▽ More
Supersaturated designs investigate more factors than there are runs, and are often constructed under a criterion measuring a design's proximity to an unattainable orthogonal design. The most popular analysis identifies active factors by inspecting the solution path of a penalized estimator, such as the lasso. Recent criteria encouraging positive correlations between factors have been shown to produce designs with more definitive solution paths so long as the active factors have positive effects. Two open problems affecting the understanding and practicality of supersaturated designs are: (1) do optimal designs under existing criteria maximize support recovery probability across an estimator's solution path, and (2) why do designs with positively correlated columns produce more definitive solution paths when the active factors have positive sign effects? To answer these questions, we develop criteria maximizing the lasso's sign recovery probability. We prove that an orthogonal design is an ideal structure when the signs of the active factors are unknown, and a design constant small, positive correlations is ideal when the signs are assumed known. A computationally-efficient design search algorithm is proposed that first filters through optimal designs under new heuristic criteria to select the one that maximizes the lasso sign recovery probability.
△ Less
Submitted 15 March, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.