-
Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning
Authors:
Gustavo A. Valencia-Zapata,
Carolina Gonzalez-Canas,
Michael G. Zentner,
Okan Ersoy,
Gerhard Klimeck
Abstract:
Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlap**, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts…
▽ More
Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlap**, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. Most of these approaches focus on remediation of one among many problems, with experimental results coming from few datasets and classification algorithms, insufficient measures of prediction power, and lack of statistical validation for testing the real benefit of the proposed approach. This paper consists of two main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Thereby, early and correct diagnosis of these problems is to be achieved in order to select not only the most convenient remediation treatment but also unbiased performance metrics. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers.
△ Less
Submitted 15 April, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms
Authors:
Gustavo A Valencia-Zapata,
Daniel Mejia,
Gerhard Klimeck,
Michael Zentner,
Okan Ersoy
Abstract:
Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and g…
▽ More
Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and generate synthetic data in order to reach better classification accuracy. It means we focus on increasing the training data synthetically to increase the classification accuracy.
△ Less
Submitted 5 September, 2017;
originally announced September 2017.
-
Grain Boundary Resistance in Copper Interconnects from an Atomistic Model to a Neural Network
Authors:
Daniel Valencia,
Evan Wilson,
Zheng** Jiang,
Gustavo A. Valencia-Zapata,
Gerhard Klimeck,
Michael Povolotskyi
Abstract:
Orientation effects on the resistivity of copper grain boundaries are studied systematically with two different atomistic tight binding methods. A methodology is developed to model the resistivity of grain boundaries using the Embedded Atom Model, tight binding methods and non-equilibrum Green's functions (NEGF). The methodology is validated against first principles calculations for small, ultra-t…
▽ More
Orientation effects on the resistivity of copper grain boundaries are studied systematically with two different atomistic tight binding methods. A methodology is developed to model the resistivity of grain boundaries using the Embedded Atom Model, tight binding methods and non-equilibrum Green's functions (NEGF). The methodology is validated against first principles calculations for small, ultra-thin body grain boundaries (<5nm) with 6.4% deviation in the resistivity. A statistical ensemble of 600 large, random structures with grains is studied. For structures with three grains, it is found that the distribution of resistivities is close to normal. Finally, a compact model for grain boundary resistivity is constructed based on a neural network.
△ Less
Submitted 8 October, 2017; v1 submitted 17 January, 2017;
originally announced January 2017.