-
Emerging Statistical Machine Learning Techniques for Extreme Temperature Forecasting in U.S. Cities
Authors:
Kameron B. Kinast,
Ernest Fokoué
Abstract:
In this paper, we present a comprehensive analysis of extreme temperature patterns using emerging statistical machine learning techniques. Our research focuses on exploring and comparing the effectiveness of various statistical models for climate time series forecasting. The models considered include Auto-Regressive Integrated Moving Average, Exponential Smoothing, Multilayer Perceptrons, and Gaus…
▽ More
In this paper, we present a comprehensive analysis of extreme temperature patterns using emerging statistical machine learning techniques. Our research focuses on exploring and comparing the effectiveness of various statistical models for climate time series forecasting. The models considered include Auto-Regressive Integrated Moving Average, Exponential Smoothing, Multilayer Perceptrons, and Gaussian Processes. We apply these methods to climate time series data from five most populated U.S. cities, utilizing Python and Julia to demonstrate the role of statistical computing in understanding climate change and its impacts. Our findings highlight the differences between the statistical methods and identify Multilayer Perceptrons as the most effective approach. Additionally, we project extreme temperatures using this best-performing method, up to 2030, and examine whether the temperature changes are greater than zero, thereby testing a hypothesis.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Improving the Predictive Performances of $k$ Nearest Neighbors Learning by Efficient Variable Selection
Authors:
Eddie Pei,
Ernest Fokoue
Abstract:
This paper computationally demonstrates a sharp improvement in predictive performance for $k$ nearest neighbors thanks to an efficient forward selection of the predictor variables. We show both simulated and real-world data that this novel repeatedly approaches outperformance regression models under stepwise selection
This paper computationally demonstrates a sharp improvement in predictive performance for $k$ nearest neighbors thanks to an efficient forward selection of the predictor variables. We show both simulated and real-world data that this novel repeatedly approaches outperformance regression models under stepwise selection
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Efficient Novelty Detection Methods for Early Warning of Potential Fatal Diseases
Authors:
Sèdjro Salomon Hotegni,
Ernest Fokoué
Abstract:
Fatal diseases, as Critical Health Episodes (CHEs), represent real dangers for patients hospitalized in Intensive Care Units. These episodes can lead to irreversible organ damage and death. Nevertheless, diagnosing them in time would greatly reduce their inconvenience. This study therefore focused on building a highly effective early warning system for CHEs such as Acute Hypotensive Episodes and T…
▽ More
Fatal diseases, as Critical Health Episodes (CHEs), represent real dangers for patients hospitalized in Intensive Care Units. These episodes can lead to irreversible organ damage and death. Nevertheless, diagnosing them in time would greatly reduce their inconvenience. This study therefore focused on building a highly effective early warning system for CHEs such as Acute Hypotensive Episodes and Tachycardia Episodes. To facilitate the precocity of the prediction, a gap of one hour was considered between the observation periods (Observation Windows) and the periods during which a critical event can occur (Target Windows). The MIMIC II dataset was used to evaluate the performance of the proposed system. This system first includes extracting additional features using three different modes. Then, the feature selection process allowing the selection of the most relevant features was performed using the Mutual Information Gain feature importance. Finally, the high-performance predictive model LightGBM was used to perform episode classification. This approach called MIG-LightGBM was evaluated using five different metrics: Event Recall (ER), Reduced Precision (RP), average Anticipation Time (aveAT), average False Alarms (aveFA), and Event F1-score (EF1-score). A method is therefore considered highly efficient for the early prediction of CHEs if it exhibits not only a large aveAT but also a large EF1-score and a low aveFA. Compared to systems using Extreme Gradient Boosting, Support Vector Classification or Naive Bayes as a predictive model, the proposed system was found to be highly dominant. It also confirmed its superiority over the Layered Learning approach.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
A Computational Exploration of Emerging Methods of Variable Importance Estimation
Authors:
Louis Mozart Kamdem,
Ernest Fokoue
Abstract:
Estimating the importance of variables is an essential task in modern machine learning. This help to evaluate the goodness of a feature in a given model. Several techniques for estimating the importance of variables have been developed during the last decade. In this paper, we proposed a computational and theoretical exploration of the emerging methods of variable importance estimation, namely: Le…
▽ More
Estimating the importance of variables is an essential task in modern machine learning. This help to evaluate the goodness of a feature in a given model. Several techniques for estimating the importance of variables have been developed during the last decade. In this paper, we proposed a computational and theoretical exploration of the emerging methods of variable importance estimation, namely: Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine (SVM), the Predictive Error Function (PERF), Random Forest (RF), and Extreme Gradient Boosting (XGBOOST) that were tested on different kinds of real-life and simulated data. All these methods can handle both regression and classification tasks seamlessly but all fail when it comes to dealing with data containing missing values. The implementation has shown that PERF has the best performance in the case of highly correlated data closely followed by RF. PERF and XGBOOST are "data-hungry" methods, they had the worst performance on small data sizes but they are the fastest when it comes to the execution time. SVM is the most appropriate when many redundant features are in the dataset. A surplus with the PERF is its natural cut-off at zero hel** to separate positive and negative scores with all positive scores indicating essential and significant features while the negatives score indicates useless features. RF and LASSO are very versatile in a way that they can be used in almost all situations despite they are not giving the best results.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
A Text Mining Discovery of Similarities and Dissimilarities Among Sacred Scriptures
Authors:
Younous Mofenjou Peuriekeu,
Victoire Djimna Noyum,
Cyrille Feudjio,
Alkan Goktug,
Ernest Fokoue
Abstract:
The careful examination of sacred texts gives valuable insights into human psychology, different ideas regarding the organization of societies as well as into terms like truth and God. To improve and deepen our understanding of sacred texts, their comparison, and their separation is crucial. For this purpose, we use our data set has nine sacred scriptures. This work deals with the separation of th…
▽ More
The careful examination of sacred texts gives valuable insights into human psychology, different ideas regarding the organization of societies as well as into terms like truth and God. To improve and deepen our understanding of sacred texts, their comparison, and their separation is crucial. For this purpose, we use our data set has nine sacred scriptures. This work deals with the separation of the Quran, the Asian scriptures Tao-Te-Ching, the Buddhism, the Yogasutras, and the Upanishads as well as the four books from the Bible, namely the Book of Proverbs, the Book of Ecclesiastes, the Book of Ecclesiasticus, and the Book of Wisdom. These scriptures are analyzed based on the natural language processing NLP creating the mathematical representation of the corpus in terms of frequencies called document term matrix (DTM). After this analysis, machine learning methods like supervised and unsupervised learning are applied to perform classification. Here we use the Multinomial Naive Bayes (MNB), the Super Vector Machine (SVM), the Random Forest (RF), and the K-nearest Neighbors (KNN). We obtain that among these methods MNB is able to predict the class of a sacred text with an accuracy of about 85.84 %.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
A Novel Use of Discrete Wavelet Transform Features in the Prediction of Epileptic Seizures from EEG Data
Authors:
Cyrille Feudjio,
Victoire Djimna Noyum,
Younous Perieukeu Mofendjou,
Rockefeller,
Ernest Fokoué
Abstract:
This paper demonstrates the predictive superiority of discrete wavelet transform (DWT) over previously used methods of feature extraction in the diagnosis of epileptic seizures from EEG data. Classification accuracy, specificity, and sensitivity are used as evaluation metrics. We specifically show the immense potential of 2 combinations (DWT-db4 combined with SVM and DWT-db2 combined with RF) as c…
▽ More
This paper demonstrates the predictive superiority of discrete wavelet transform (DWT) over previously used methods of feature extraction in the diagnosis of epileptic seizures from EEG data. Classification accuracy, specificity, and sensitivity are used as evaluation metrics. We specifically show the immense potential of 2 combinations (DWT-db4 combined with SVM and DWT-db2 combined with RF) as compared to others when it comes to diagnosing epileptic seizures either in the balanced or the imbalanced dataset. The results also highlight that MFCC performs less than all the DWT used in this study and that, The mean-differences are statistically significant respectively in the imbalanced and balanced dataset. Finally, either in the balanced or the imbalanced dataset, the feature extraction techniques, the models, and the interaction between them have a statistically significant effect on the classification accuracy.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction
Authors:
Victoire Djimna Noyum,
Younous Perieukeu Mofenjou,
Cyrille Feudjio,
Alkan Göktug,
Ernest Fokoué
Abstract:
Facing the diversity and growth of the musical field nowadays, the search for precise songs becomes more and more complex. The identity of the singer facilitates this search. In this project, we focus on the problem of identifying the singer by using different methods for feature extraction. Particularly, we introduce the Discrete Wavelet Transform (DWT) for this purpose. To the best of our knowle…
▽ More
Facing the diversity and growth of the musical field nowadays, the search for precise songs becomes more and more complex. The identity of the singer facilitates this search. In this project, we focus on the problem of identifying the singer by using different methods for feature extraction. Particularly, we introduce the Discrete Wavelet Transform (DWT) for this purpose. To the best of our knowledge, DWT has never been used this way before in the context of singer identification. This process consists of three crucial parts. First, the vocal signal is separated from the background music by using the Robust Principal Component Analysis (RPCA). Second, features from the obtained vocal signal are extracted. Here, the goal is to study the performance of the Discrete Wavelet Transform (DWT) in comparison to the Mel Frequency Cepstral Coefficient (MFCC) which is the most used technique in audio signals. Finally, we proceed with the identification of the singer where two methods have experimented: the Support Vector Machine (SVM), and the Gaussian Mixture Model (GMM). We conclude that, for a dataset of 4 singers and 200 songs, the best identification system consists of the DWT (db4) feature extraction introduced in this work combined with a linear support vector machine for identification resulting in a mean accuracy of 83.96%.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Nonnegative Matrix Factorization with Zellner Penalty
Authors:
Matthew Corsetti,
Ernest Fokoué
Abstract:
Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has applications in image processing, text mining, recommendation systems and a variety of other fields. Since its inception, the NMF algorithm has been modified and explored by numerous au…
▽ More
Nonnegative matrix factorization (NMF) is a relatively new unsupervised learning algorithm that decomposes a nonnegative data matrix into a parts-based, lower dimensional, linear representation of the data. NMF has applications in image processing, text mining, recommendation systems and a variety of other fields. Since its inception, the NMF algorithm has been modified and explored by numerous authors. One such modification involves the addition of auxiliary constraints to the objective function of the factorization. The purpose of these auxiliary constraints is to impose task-specific penalties or restrictions on the objective function. Though many auxiliary constraints have been studied, none have made use of data-dependent penalties. In this paper, we propose Zellner nonnegative matrix factorization (ZNMF), which uses data-dependent auxiliary constraints. We assess the facial recognition performance of the ZNMF algorithm and several other well-known constrained NMF algorithms using the Cambridge ORL database.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Nonnegative Matrix Factorization with Toeplitz Penalty
Authors:
Matthew Corsetti,
Ernest Fokoué
Abstract:
Nonnegative Matrix Factorization (NMF) is an unsupervised learning algorithm that produces a linear, parts-based approximation of a data matrix. NMF constructs a nonnegative low rank basis matrix and a nonnegative low rank matrix of weights which, when multiplied together, approximate the data matrix of interest using some cost function. The NMF algorithm can be modified to include auxiliary const…
▽ More
Nonnegative Matrix Factorization (NMF) is an unsupervised learning algorithm that produces a linear, parts-based approximation of a data matrix. NMF constructs a nonnegative low rank basis matrix and a nonnegative low rank matrix of weights which, when multiplied together, approximate the data matrix of interest using some cost function. The NMF algorithm can be modified to include auxiliary constraints which impose task-specific penalties or restrictions on the cost function of the matrix factorization. In this paper we propose a new NMF algorithm that makes use of non-data dependent auxiliary constraints which incorporate a Toeplitz matrix into the multiplicative updating of the basis and weight matrices. We compare the facial recognition performance of our new Toeplitz Nonnegative Matrix Factorization (TNMF) algorithm to the performance of the Zellner Nonnegative Matrix Factorization (ZNMF) algorithm which makes use of data-dependent auxiliary constraints. We also compare the facial recognition performance of the two aforementioned algorithms with the performance of several preexisting constrained NMF algorithms that have non-data-dependent penalties. The facial recognition performances are evaluated using the Cambridge ORL Database of Faces and the Yale Database of Faces.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Graph Enhanced High Dimensional Kernel Regression
Authors:
E. Pei,
E. Fokoué
Abstract:
In this paper, the flexibility, versatility and predictive power of kernel regression are combined with now lavishly available network data to create regression models with even greater predictive performances. Building from previous work featuring generalized linear models built in the presence of network cohesion data, we construct a kernelized extension that captures subtler nonlinearities in e…
▽ More
In this paper, the flexibility, versatility and predictive power of kernel regression are combined with now lavishly available network data to create regression models with even greater predictive performances. Building from previous work featuring generalized linear models built in the presence of network cohesion data, we construct a kernelized extension that captures subtler nonlinearities in extremely high dimensional spaces and also produces far better predictive performances. Applications of seamless yet substantial adaptation to simulated and real-life data demonstrate the appeal and strength of our work.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
What do Asian Religions Have in Common? An Unsupervised Text Analytics Exploration
Authors:
Preeti Sah,
Ernest Fokoué
Abstract:
The main source of various religious teachings is their sacred texts which vary from religion to religion based on different factors like the geographical location or time of the birth of a particular religion. Despite these differences, there could be similarities between the sacred texts based on what lessons it teaches to its followers. This paper attempts to find the similarity using text mini…
▽ More
The main source of various religious teachings is their sacred texts which vary from religion to religion based on different factors like the geographical location or time of the birth of a particular religion. Despite these differences, there could be similarities between the sacred texts based on what lessons it teaches to its followers. This paper attempts to find the similarity using text mining techniques. The corpus consisting of Asian (Tao Te Ching, Buddhism, Yogasutra, Upanishad) and non-Asian (four Bible texts) is used to explore findings of similarity measures like Euclidean, Manhattan, Jaccard and Cosine on raw Document Term Frequency [DTM], normalized DTM which reveals similarity based on word usage. The performance of Supervised learning algorithms like K-Nearest Neighbor [KNN], Support Vector Machine [SVM] and Random Forest is measured based on its accuracy to predict correct scared text for any given chapter in the corpus. The K-means clustering visualizations on Euclidean distances of raw DTM reveals that there exists a pattern of similarity among these sacred texts with Upanishads and Tao Te Ching is the most similar text in the corpus.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning
Authors:
Sanjeev Raja,
Ernest Fokoué
Abstract:
In the monitoring of a complex electric grid, it is of paramount importance to provide operators with early warnings of anomalies detected on the network, along with a precise classification and diagnosis of the specific fault type. In this paper, we propose a novel multi-stage early warning system prototype for electric grid fault detection, classification, subgroup discovery, and visualization.…
▽ More
In the monitoring of a complex electric grid, it is of paramount importance to provide operators with early warnings of anomalies detected on the network, along with a precise classification and diagnosis of the specific fault type. In this paper, we propose a novel multi-stage early warning system prototype for electric grid fault detection, classification, subgroup discovery, and visualization. In the first stage, a computationally efficient anomaly detection method based on quartiles detects the presence of a fault in real time. In the second stage, the fault is classified into one of nine pre-defined disaster scenarios. The time series data are first mapped to highly discriminative features by applying dimensionality reduction based on temporal autocorrelation. The features are then mapped through one of three classification techniques: support vector machine, random forest, and artificial neural network. Finally in the third stage, intra-class clustering based on dynamic time war** is used to characterize the fault with further granularity. Results on the Bonneville Power Administration electric grid data show that i) the proposed anomaly detector is both fast and accurate; ii) dimensionality reduction leads to dramatic improvement in classification accuracy and speed; iii) the random forest method offers the most accurate, consistent, and robust fault classification; and iv) time series within a given class naturally separate into five distinct clusters which correspond closely to the geographical distribution of electric grid buses.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Naive Dictionary On Musical Corpora: From Knowledge Representation To Pattern Recognition
Authors:
Qiuyi Wu,
Ernest Fokoue
Abstract:
In this paper, we propose and develop the novel idea of treating musical sheets as literary documents in the traditional text analytics parlance, to fully benefit from the vast amount of research already existing in statistical text mining and topic modelling. We specifically introduce the idea of representing any given piece of music as a collection of "musical words" that we codenamed "muselets"…
▽ More
In this paper, we propose and develop the novel idea of treating musical sheets as literary documents in the traditional text analytics parlance, to fully benefit from the vast amount of research already existing in statistical text mining and topic modelling. We specifically introduce the idea of representing any given piece of music as a collection of "musical words" that we codenamed "muselets", which are essentially musical words of various lengths. Given the novelty and therefore the extremely difficulty of properly forming a complete version of a dictionary of muselets, the present paper focuses on a simpler albeit naive version of the ultimate dictionary, which we refer to as a Naive Dictionary because of the fact that all the words are of the same length. We specifically herein construct a naive dictionary featuring a corpus made up of African American, Chinese, Japanese and Arabic music, on which we perform both topic modelling and pattern recognition. Although some of the results based on the Naive Dictionary are reasonably good, we anticipate phenomenal predictive performances once we get around to actually building a full scale complete version of our intended dictionary of muselets.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
To Bayes or Not To Bayes? That's no longer the question!
Authors:
Ernest Fokoue
Abstract:
This paper seeks to provide a thorough account of the ubiquitous nature of the Bayesian paradigm in modern statistics, data science and artificial intelligence. Once maligned, on the one hand by those who philosophically hated the very idea of subjective probability used in prior specification, and on the other hand because of the intractability of the computations needed for Bayesian estimation a…
▽ More
This paper seeks to provide a thorough account of the ubiquitous nature of the Bayesian paradigm in modern statistics, data science and artificial intelligence. Once maligned, on the one hand by those who philosophically hated the very idea of subjective probability used in prior specification, and on the other hand because of the intractability of the computations needed for Bayesian estimation and inference, the Bayesian school of thought now permeates and pervades virtually all areas of science, applied science, engineering, social science and even liberal arts, often in unsuspected ways. Thanks in part to the availability of powerful computing resources, but also to the literally unavoidable inherent presence of the quintessential building blocks of the Bayesian paradigm in all walks of life, the Bayesian way of handling statistical learning, estimation and inference is not only mainstream but also becoming the most central approach to learning from the data. This paper explores some of the most relevant elements to help to the reader appreciate the pervading power and presence of the Bayesian paradigm in statistics, artificial intelligence and data science, with an emphasis on how the Gospel according to Reverend Thomas Bayes has turned out to be the truly good news, and some cases the amazing saving grace, for all who seek to learn statistically from the data. To further help the reader gain deeper and tangible practical insights into the Bayesian machinery, we point to some computational tools designed for the R Statistical Software Environment to help explore Bayesian statistical learning.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Meta-Learning with Hessian-Free Approach in Deep Neural Nets Training
Authors:
Boyu Chen,
Wenlian Lu,
Ernest Fokoue
Abstract:
Meta-learning is a promising method to achieve efficient training method towards deep neural net and has been attracting increases interests in recent years. But most of the current methods are still not capable to train complex neuron net model with long-time training process. In this paper, a novel second-order meta-optimizer, named Meta-learning with Hessian-Free(MLHF) approach, is proposed bas…
▽ More
Meta-learning is a promising method to achieve efficient training method towards deep neural net and has been attracting increases interests in recent years. But most of the current methods are still not capable to train complex neuron net model with long-time training process. In this paper, a novel second-order meta-optimizer, named Meta-learning with Hessian-Free(MLHF) approach, is proposed based on the Hessian-Free approach. Two recurrent neural networks are established to generate the dam** and the precondition matrix of this Hessian-Free framework. A series of techniques to meta-train the MLHF towards stable and reinforce the meta-training of this optimizer, including the gradient calculation of $H$. Numerical experiments on deep convolution neural nets, including CUDA-convnet and ResNet18(v2), with datasets of CIFAR10 and ILSVRC2012, indicate that the MLHF shows good and continuous training performance during the whole long-time training process, i.e., both the rapid-decreasing early stage and the steadily-deceasing later stage, and so is a promising meta-learning framework towards elevating the training efficiency in real-world deep neural nets.
△ Less
Submitted 7 September, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
On the Statistical Challenges of Echo State Networks and Some Potential Remedies
Authors:
Qiuyi Wu,
Ernest Fokoue,
Dhireesha Kudithipudi
Abstract:
Echo state networks are powerful recurrent neural networks. However, they are often unstable and shaky, making the process of finding an good ESN for a specific dataset quite hard. Obtaining a superb accuracy by using the Echo State Network is a challenging task. We create, develop and implement a family of predictably optimal robust and stable ensemble of Echo State Networks via regularizing the…
▽ More
Echo state networks are powerful recurrent neural networks. However, they are often unstable and shaky, making the process of finding an good ESN for a specific dataset quite hard. Obtaining a superb accuracy by using the Echo State Network is a challenging task. We create, develop and implement a family of predictably optimal robust and stable ensemble of Echo State Networks via regularizing the training and perturbing the input. Furthermore, several distributions of weights have been tried based on the shape to see if the shape of the distribution has the impact for reducing the error. We found ESN can track in short term for most dataset, but it collapses in the long run. Short-term tracking with large size reservoir enables ESN to perform strikingly with superior prediction. Based on this scenario, we go a further step to aggregate many of ESNs into an ensemble to lower the variance and stabilize the system by stochastic replications and bootstrap** of input data.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
A Mathematical Formalization of Hierarchical Temporal Memory's Spatial Pooler
Authors:
James Mnatzaganian,
Ernest Fokoué,
Dhireesha Kudithipudi
Abstract:
Hierarchical temporal memory (HTM) is an emerging machine learning algorithm, with the potential to provide a means to perform predictions on spatiotemporal data. The algorithm, inspired by the neocortex, currently does not have a comprehensive mathematical framework. This work brings together all aspects of the spatial pooler (SP), a critical learning component in HTM, under a single unifying fra…
▽ More
Hierarchical temporal memory (HTM) is an emerging machine learning algorithm, with the potential to provide a means to perform predictions on spatiotemporal data. The algorithm, inspired by the neocortex, currently does not have a comprehensive mathematical framework. This work brings together all aspects of the spatial pooler (SP), a critical learning component in HTM, under a single unifying framework. The primary learning mechanism is explored, where a maximum likelihood estimator for determining the degree of permanence update is proposed. The boosting mechanisms are studied and found to be only relevant during the initial few iterations of the network. Observations are made relating HTM to well-known algorithms such as competitive learning and attribute bagging. Methods are provided for using the SP for classification as well as dimensionality reduction. Empirical evidence verifies that given the proper parameterizations, the SP may be used for feature learning.
△ Less
Submitted 8 September, 2016; v1 submitted 22 January, 2016;
originally announced January 2016.
-
Bayesian Variable Selection for Linear Regression with the $κ$-$G$ Priors
Authors:
Zichen Ma,
Ernest Fokoué
Abstract:
In this paper, we introduce a new methodology for Bayesian variable selection in linear regression that is independent of the traditional indicator method. A diagonal matrix $\mathbf{G}$ is introduced to the prior of the coefficient vector $\boldsymbolβ$, with each of the $g_j$'s, bounded between $0$ and $1$, on the diagonal serves as a stabilizer of the corresponding $β_j$. Mathematically, a prom…
▽ More
In this paper, we introduce a new methodology for Bayesian variable selection in linear regression that is independent of the traditional indicator method. A diagonal matrix $\mathbf{G}$ is introduced to the prior of the coefficient vector $\boldsymbolβ$, with each of the $g_j$'s, bounded between $0$ and $1$, on the diagonal serves as a stabilizer of the corresponding $β_j$. Mathematically, a promising variable has a $g_j$ value that is close to $0$, whereas the value of $g_j$ corresponding to an unpromising variable is close to $1$. This property is proven in this paper under orthogonality together with other asymptotic properties. Computationally, the sample path of each $g_j$ is obtained through Metropolis-within-Gibbs sampling method. Also, in this paper we give two simulations to verify the capability of this methodology in variable selection.
△ Less
Submitted 18 October, 2016; v1 submitted 21 March, 2015;
originally announced March 2015.
-
On the Predictive Properties of Binary Link Functions
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characteriz…
▽ More
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characterizing firmly all the aspects of the similarities and differences. This paper proposes a definition of both structural and predictive equivalence of link functions-based binary regression models, and explores the various ways in which they are either similar or dissimilar. From a predictive analytics perspective, it turns out that not only are probit and logit perfectly predictively concordant, but the other link functions like cauchit and complementary log log enjoy very high percentage of predictive equivalence. Throughout this paper, simulated and real life examples demonstrate all the equivalence results that we prove theoretically.
△ Less
Submitted 16 February, 2015;
originally announced February 2015.
-
Random Subspace Learning Approach to High-Dimensional Outliers Detection
Authors:
Bohan Liu,
Ernest Fokoue
Abstract:
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like minimum covariance determinant (MCD) by computing the needed determinants and associated measur…
▽ More
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like minimum covariance determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection is concerned.
△ Less
Submitted 3 May, 2015; v1 submitted 15 February, 2015;
originally announced February 2015.
-
Adaptive Random SubSpace Learning (RSSL) Algorithm for Prediction
Authors:
Mohamed Elshrif,
Ernest Fokoue
Abstract:
We present a novel adaptive random subspace learning algorithm (RSSL) for prediction purpose. This new framework is flexible where it can be adapted with any learning technique. In this paper, we tested the algorithm for regression and classification problems. In addition, we provide a variety of weighting schemes to increase the robustness of the developed algorithm. These different wighting flav…
▽ More
We present a novel adaptive random subspace learning algorithm (RSSL) for prediction purpose. This new framework is flexible where it can be adapted with any learning technique. In this paper, we tested the algorithm for regression and classification problems. In addition, we provide a variety of weighting schemes to increase the robustness of the developed algorithm. These different wighting flavors were evaluated on simulated as well as on real-world data sets considering the cases where the ratio between features (attributes) and instances (samples) is large and vice versa. The framework of the new algorithm consists of many stages: first, calculate the weights of all features on the data set using the correlation coefficient and F-statistic statistical measurements. Second, randomly draw n samples with replacement from the data set. Third, perform regular bootstrap sampling (bagging). Fourth, draw without replacement the indices of the chosen variables. The decision was taken based on the heuristic subspacing scheme. Fifth, call base learners and build the model. Sixth, use the model for prediction purpose on test set of the data. The results show the advancement of the adaptive RSSL algorithm in most of the cases compared with the synonym (conventional) machine learning algorithms.
△ Less
Submitted 9 February, 2015;
originally announced February 2015.
-
A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs
Authors:
Zichen Ma,
Ernest Fokoue
Abstract:
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is…
▽ More
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation
△ Less
Submitted 28 January, 2015;
originally announced January 2015.
-
Prediction Error Reduction Function as a Variable Importance Score
Authors:
Ernest Fokoué
Abstract:
This paper introduces and develops a novel variable importance score function in the context of ensemble learning and demonstrates its appeal both theoretically and empirically. Our proposed score function is simple and more straightforward than its counterpart proposed in the context of random forest, and by avoiding permutations, it is by design computationally more efficient than the random for…
▽ More
This paper introduces and develops a novel variable importance score function in the context of ensemble learning and demonstrates its appeal both theoretically and empirically. Our proposed score function is simple and more straightforward than its counterpart proposed in the context of random forest, and by avoiding permutations, it is by design computationally more efficient than the random forest variable importance function. Just like the random forest variable importance function, our score handles both regression and classification seamlessly. One of the distinct advantage of our proposed score is the fact that it offers a natural cut off at zero, with all the positive scores indicating importance and significance, while the negative scores are deemed indications of insignificance. An extra advantage of our proposed score lies in the fact it works very well beyond ensemble of trees and can seamlessly be used with any base learners in the random subspace learning context. Our examples, both simulated and real, demonstrate that our proposed score does compete mostly favorably with the random forest score.
△ Less
Submitted 25 January, 2015;
originally announced January 2015.
-
An Information-Theoretic Alternative to the Cronbach's Alpha Coefficient of Item Reliability
Authors:
Ernest Fokoue,
Necla Gunduz
Abstract:
We propose an information-theoretic alternative to the popular Cronbach alpha coefficient of reliability. Particularly suitable for contexts in which instruments are scored on a strictly nonnumeric scale, our proposed index is based on functions of the entropy of the distributions of defined on the sample space of responses. Our reliability index tracks the Cronbach alpha coefficient uniformly whi…
▽ More
We propose an information-theoretic alternative to the popular Cronbach alpha coefficient of reliability. Particularly suitable for contexts in which instruments are scored on a strictly nonnumeric scale, our proposed index is based on functions of the entropy of the distributions of defined on the sample space of responses. Our reliability index tracks the Cronbach alpha coefficient uniformly while offering several other advantages discussed in great details in this paper.
△ Less
Submitted 16 January, 2015;
originally announced January 2015.
-
Pattern Discovery in Students' Evaluations of Professors: A Statistical Data Mining Approach
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
The evaluation of instructors by their students has been practiced at most universities for many decades, and there has always been a great interest in a variety of aspects of the evaluations. Are students matured and knowledgeable enough to provide useful and dependable feedback for the improvement of their instructors' teaching skills/abilities? Does the level of difficulty of the course have a…
▽ More
The evaluation of instructors by their students has been practiced at most universities for many decades, and there has always been a great interest in a variety of aspects of the evaluations. Are students matured and knowledgeable enough to provide useful and dependable feedback for the improvement of their instructors' teaching skills/abilities? Does the level of difficulty of the course have a strong relationship with the rating the student give an instructor? In this paper, we attempt to answer questions such as these using some state of the art statistical data mining techniques such support vector machines, classification and regression trees, boosting, random forest, factor analysis, kMeans clustering. hierarchical clustering. We explore various aspects of the data from both the supervised and unsupervised learning perspective. The data set analyzed in this paper was collected from a university in Turkey. The application of our techniques to this data reveals some very interesting patterns in the evaluations, like the strong association between the student's seriousness and dedication (measured by attendance) and the kind of scores they tend to assign to their instructors.
△ Less
Submitted 9 January, 2015;
originally announced January 2015.
-
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Authors:
Ernest Fokoue
Abstract:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly us…
▽ More
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
△ Less
Submitted 3 January, 2015;
originally announced January 2015.
-
Robust Classification of High Dimension Low Sample Size Data
Authors:
Necla Gunduz,
Ernest Fokoue
Abstract:
The robustification of pattern recognition techniques has been the subject of intense research in recent years. Despite the multiplicity of papers on the subject, very few articles have deeply explored the topic of robust classification in the high dimension low sample size context. In this work, we explore and compare the predictive performances of robust classification techniques with a special…
▽ More
The robustification of pattern recognition techniques has been the subject of intense research in recent years. Despite the multiplicity of papers on the subject, very few articles have deeply explored the topic of robust classification in the high dimension low sample size context. In this work, we explore and compare the predictive performances of robust classification techniques with a special concentration on robust discriminant analysis and robust PCA applied to a wide variety of large $p$ small $n$ data sets. We also explore the performance of random forest by way of comparing and contrasting the differences single model methods and ensemble methods in this context. Our work reveals that Random Forest, although not inherently designed to be robust to outliers, substantially outperforms the existing techniques specifically designed to achieve robustness. Indeed, random forest emerges as the best predictively on both real life and simulated data.
△ Less
Submitted 3 January, 2015;
originally announced January 2015.
-
Probit Normal Correlated Topic Models
Authors:
Xingchen Yu,
Ernest Fokoue
Abstract:
The logistic normal distribution has recently been adapted via the transformation of multivariate Gaus- sian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many a…
▽ More
The logistic normal distribution has recently been adapted via the transformation of multivariate Gaus- sian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far con- centrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modelling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our ap- proach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.
△ Less
Submitted 3 October, 2014;
originally announced October 2014.