Search | arXiv e-print repository

Fundamental Laws of Binary Classification

Abstract: Finding discriminant functions of minimum risk binary classification systems is a novel geometric locus problem -- which requires solving a system of fundamental locus equations of binary classification -- subject to deep-seated statistical laws. We show that a discriminant function of a minimum risk binary classification system is the solution of a locus equation that represents the geometric loc… ▽ More Finding discriminant functions of minimum risk binary classification systems is a novel geometric locus problem -- which requires solving a system of fundamental locus equations of binary classification -- subject to deep-seated statistical laws. We show that a discriminant function of a minimum risk binary classification system is the solution of a locus equation that represents the geometric locus of the decision boundary of the system, wherein the discriminant function is connected to the decision boundary by an exclusive principal eigen-coordinate system -- at which point the discriminant function is represented by a geometric locus of a novel principal eigenaxis -- structured as a dual locus of likelihood components and principal eigenaxis components. We demonstrate that a minimum risk binary classification system acts to jointly minimize its eigenenergy and risk by locating a point of equilibrium, at which point critical minimum eigenenergies exhibited by the system are symmetrically concentrated in such a manner that the novel principal eigenaxis of the system exhibits symmetrical dimensions and densities, so that counteracting and opposing forces and influences of the system are symmetrically balanced with each other -- about the geometric center of the locus of the novel principal eigenaxis -- whereon the statistical fulcrum of the system is located. Thereby, a minimum risk binary classification system satisfies a state of statistical equilibrium -- so that the total allowed eigenenergy and the expected risk exhibited by the system are jointly minimized within the decision space of the system -- at which point the system exhibits the minimum probability of classification error. △ Less

Submitted 2 January, 2023; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: 265 pages, 21 figures: We present a comprehensive treatise on the binary classification of random vectors. We formulate the direct problem by generalizing a well-posed variant of Bayes' decision rule. We formulate the inverse problem by generalizing a well-posed variant of the constrained optimization algorithm used by support vector machines to learn nonlinear decision boundaries

arXiv:1612.03902 [pdf, other]

Design of Data-Driven Mathematical Laws for Optimal Statistical Classification Systems

Authors: Denise M. Reeves

Abstract: This article will devise data-driven, mathematical laws that generate optimal, statistical classification systems which achieve minimum error rates for data distributions with unchanging statistics. Thereby, I will design learning machines that minimize the expected risk or probability of misclassification. I will devise a system of fundamental equations of binary classification for a classificati… ▽ More This article will devise data-driven, mathematical laws that generate optimal, statistical classification systems which achieve minimum error rates for data distributions with unchanging statistics. Thereby, I will design learning machines that minimize the expected risk or probability of misclassification. I will devise a system of fundamental equations of binary classification for a classification system in statistical equilibrium. I will use this system of equations to formulate the problem of learning unknown, linear and quadratic discriminant functions from data as a locus problem, thereby formulating geometric locus methods within a statistical framework. Solving locus problems involves finding equations of curves or surfaces defined by given properties and finding graphs or loci of given equations. I will devise three systems of data-driven, locus equations that generate optimal, statistical classification systems. Each class of learning machines satisfies fundamental statistical laws for a classification system in statistical equilibrium. Thereby, I will formulate three classes of learning machines that are scalable modules for optimal, statistical pattern recognition systems, all of which are capable of performing a wide variety of statistical pattern recognition tasks, where any given M-class statistical pattern recognition system exhibits optimal generalization performance for an M-class feature space. △ Less

Submitted 19 May, 2018; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: 339 pages, 52 figures. arXiv admin note: text overlap with arXiv:1511.05102

arXiv:1511.05102 [pdf, other]

Resolving the Geometric Locus Dilemma for Support Vector Learning Machines

Authors: Denise M. Reeves

Abstract: Capacity control, the bias/variance dilemma, and learning unknown functions from data, are all concerned with identifying effective and consistent fits of unknown geometric loci to random data points. A geometric locus is a curve or surface formed by points, all of which possess some uniform property. A geometric locus of an algebraic equation is the set of points whose coordinates are solutions o… ▽ More Capacity control, the bias/variance dilemma, and learning unknown functions from data, are all concerned with identifying effective and consistent fits of unknown geometric loci to random data points. A geometric locus is a curve or surface formed by points, all of which possess some uniform property. A geometric locus of an algebraic equation is the set of points whose coordinates are solutions of the equation. Any given curve or surface must pass through each point on a specified locus. This paper argues that it is impossible to fit random data points to algebraic equations of partially configured geometric loci that reference arbitrary Cartesian coordinate systems. It also argues that the fundamental curve of a linear decision boundary is actually a principal eigenaxis. It is shown that learning principal eigenaxes of linear decision boundaries involves finding a point of statistical equilibrium for which eigenenergies of principal eigenaxis components are symmetrically balanced with each other. It is demonstrated that learning linear decision boundaries involves strong duality relationships between a statistical eigenlocus of principal eigenaxis components and its algebraic forms, in primal and dual, correlated Hilbert spaces. Locus equations are introduced and developed that describe principal eigen-coordinate systems for lines, planes, and hyperplanes. These equations are used to introduce and develop primal and dual statistical eigenlocus equations of principal eigenaxes of linear decision boundaries. Important generalizations for linear decision boundaries are shown to be encoded within a dual statistical eigenlocus of principal eigenaxis components. Principal eigenaxes of linear decision boundaries are shown to encode Bayes' likelihood ratio for common covariance data and a robust likelihood ratio for all other data. △ Less

Submitted 16 November, 2015; originally announced November 2015.

Comments: 170 pages, 33 figures

arXiv:1107.0034 [pdf, ps]

doi 10.1613/jair.1333

Price Prediction in a Trading Agent Competition

Authors: K. M. Lochner, D. M. Reeves, Y. Vorobeychik, M. P. Wellman

Abstract: The 2002 Trading Agent Competition (TAC) presented a challenging market game in the domain of travel shop**. One of the pivotal issues in this domain is uncertainty about hotel prices, which have a significant influence on the relative cost of alternative trip schedules. Thus, virtually all participants employ some method for predicting hotel prices. We survey approaches employed in the tourn… ▽ More The 2002 Trading Agent Competition (TAC) presented a challenging market game in the domain of travel shop**. One of the pivotal issues in this domain is uncertainty about hotel prices, which have a significant influence on the relative cost of alternative trip schedules. Thus, virtually all participants employ some method for predicting hotel prices. We survey approaches employed in the tournament, finding that agents apply an interesting diversity of techniques, taking into account differing sources of evidence bearing on prices. Based on data provided by entrants on their agents' actual predictions in the TAC-02 finals and semifinals, we analyze the relative efficacy of these approaches. The results show that taking into account game-specific information about flight prices is a major distinguishing factor. Machine learning methods effectively induce the relationship between flight and hotel prices from game data, and a purely analytical approach based on competitive equilibrium analysis achieves equal accuracy with no historical data. Employing a new measure of prediction quality, we relate absolute accuracy to bottom-line performance in the game. △ Less

Submitted 30 June, 2011; originally announced July 2011.

Journal ref: Journal Of Artificial Intelligence Research, Volume 21, pages 19-36, 2004

Showing 1–4 of 4 results for author: Reeves, D M