Search | arXiv e-print repository

Closure operators: Complexity and applications to classification and decision-making

Authors: Hamed Hamze Bajgiran, Federico Echenique

Abstract: We study the complexity of closure operators, with applications to machine learning and decision theory. In machine learning, closure operators emerge naturally in data classification and clustering. In decision theory, they can model equivalence of choice menus, and therefore situations with a preference for flexibility. Our contribution is to formulate a notion of complexity of closure operators… ▽ More We study the complexity of closure operators, with applications to machine learning and decision theory. In machine learning, closure operators emerge naturally in data classification and clustering. In decision theory, they can model equivalence of choice menus, and therefore situations with a preference for flexibility. Our contribution is to formulate a notion of complexity of closure operators, which translate into the complexity of a classifier in ML, or of a utility function in decision theory. △ Less

Submitted 23 May, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2112.04161 [pdf, ps, other]

Aggregation of Pareto optimal models

Authors: Hamed Hamze Bajgiran, Houman Owhadi

Abstract: In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preser… ▽ More In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior $π$ prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors $π$ and $π'$ prefer model A over model B, then the prior obtained by aggregating $π$ and $π'$ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2111.11630 [pdf, ps, other]

Aggregation of Models, Choices, Beliefs, and Preferences

Authors: Hamed Hamze Bajgiran, Houman Owhadi

Abstract: A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models $A$ and $B$, if the output of model $A$ is $f(A)$ and if the output model $B$ is $f(B)$, then the output of the model obtained by aggregating $A$ and $B$ must be a weighted average of $f(A)$ and $f(B)$. Similarly, a natural notion of rationality for aggregating preferences of ensembles… ▽ More A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models $A$ and $B$, if the output of model $A$ is $f(A)$ and if the output model $B$ is $f(B)$, then the output of the model obtained by aggregating $A$ and $B$ must be a weighted average of $f(A)$ and $f(B)$. Similarly, a natural notion of rationality for aggregating preferences of ensembles of experts is that, for all (possibly aggregated) experts $A$ and $B$, and all possible choices $x$ and $y$, if both $A$ and $B$ prefer $x$ over $y$, then the expert obtained by aggregating $A$ and $B$ must also prefer $x$ over $y$. Rational aggregation is an important element of uncertainty quantification, and it lies behind many seemingly different results in economic theory: spanning social choice, belief formation, and individual decision making. Three examples of rational aggregation rules are as follows. (1) Give each individual model (expert) a weight (a score) and use weighted averaging to aggregate individual or finite ensembles of models (experts). (2) Order/rank individual model (expert) and let the aggregation of a finite ensemble of individual models (experts) be the highest-ranked individual model (expert) in that ensemble. (3) Give each individual model (expert) a weight, introduce a weak order/ranking over the set of models/experts, aggregate $A$ and $B$ as the weighted average of the highest-ranked models (experts) in $A$ or $B$. Note that (1) and (2) are particular cases of (3). In this paper, we show that all rational aggregation rules are of the form (3). This result unifies aggregation procedures across different economic environments. Following the main representation, we show applications and extensions of our representation in various separated economics topics such as belief formation, choice theory, and social welfare economics. △ Less

Submitted 22 November, 2021; originally announced November 2021.

arXiv:2108.10517 [pdf, other]

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball

Authors: Hamed Hamze Bajgiran, Pau Batlle Franch, Houman Owhadi, Mostafa Samir, Clint Scovel, Mahdy Shirdel, Michael Stanley, Peyman Tavallali

Abstract: There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation s… ▽ More There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample $x$, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in $ [0,1]$ acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near $1$, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near $0$, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation. △ Less

Submitted 13 September, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

Comments: 49 pages. To appear in the Journal of Computational Physics

MSC Class: 62C20; 62F03; 62F35; 62F25; 68T37

arXiv:2103.09982 [pdf, other]

Decision Theoretic Bootstrap**

Authors: Peyman Tavallali, Hamed Hamze Bajgiran, Danial J. Esaid, Houman Owhadi

Abstract: The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken… ▽ More The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). We present a general decision-theoretic bootstrap** solution to this problem: (1) partition the available data into a training subset and a UQ subset (2) take $m$ subsampled subsets of the training set and train $m$ models (3) partition the UQ set into $n$ sorted subsets and take a random fraction of them to define $n$ corresponding empirical distributions $μ_{j}$ (4) consider the adversarial game where Player I selects a model $i\in\left\{ 1,\ldots,m\right\} $, Player II selects the UQ distribution $μ_{j}$ and Player I receives a loss defined by evaluating the model $i$ against data points sampled from $μ_{j}$ (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to distributional shift in both the distribution of training data and that of the testing data (2) conditional probability distributions on the output space forming aleatory representations of the uncertainty on the output as a function of the input variable. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Showing 1–5 of 5 results for author: Bajgiran, H H