-
Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16
Authors:
T. T Lemani,
T. L. van Zyl
Abstract:
Reliable and efficient monitoring of wild animals is crucial to inform management and conservation decisions. The process of manually identifying species of animals is time-consuming, monotonous, and expensive. Leveraging on advances in deep learning and computer vision, we investigate in this paper the efficiency of pre-trained models, specifically the VGG-16 and ResNet-50 model, in identifying a…
▽ More
Reliable and efficient monitoring of wild animals is crucial to inform management and conservation decisions. The process of manually identifying species of animals is time-consuming, monotonous, and expensive. Leveraging on advances in deep learning and computer vision, we investigate in this paper the efficiency of pre-trained models, specifically the VGG-16 and ResNet-50 model, in identifying a male Kudu and a male Nyala in their natural habitats. These pre-trained models have proven to be efficient in animal identification in general. Still, there is little research on animals like the Kudu and Nyala, who are usually well camouflaged and have similar features. The method of transfer learning used in this paper is the fine-tuning method. The models are evaluated before and after fine-tuning. The experimental results achieved an accuracy of 93.2\% and 97.7\% for the VGG-16 and ResNet-50 models, respectively, before fine-tuning and 97.7\% for both models after fine-tuning. Although these results are impressive, it should be noted that they were taken over a small sample size of 550 images split in half between the two classes; therefore, this might not cater to enough scenarios to get a full conclusion of the efficiency of the models. Therefore, there is room for more work in getting a more extensive dataset and testing and extending to the female counterparts of these species and the whole antelope species.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Machine Learning for Socially Responsible Portfolio Optimisation
Authors:
Taeisha Nundlall,
Terence L Van Zyl
Abstract:
Socially responsible investors build investment portfolios intending to incite social and environmental advancement alongside a financial return. Although Mean-Variance (MV) models successfully generate the highest possible return based on an investor's risk tolerance, MV models do not make provisions for additional constraints relevant to socially responsible (SR) investors. In response to this p…
▽ More
Socially responsible investors build investment portfolios intending to incite social and environmental advancement alongside a financial return. Although Mean-Variance (MV) models successfully generate the highest possible return based on an investor's risk tolerance, MV models do not make provisions for additional constraints relevant to socially responsible (SR) investors. In response to this problem, the MV model must consider Environmental, Social, and Governance (ESG) scores in optimisation. Based on the prominent MV model, this study implements portfolio optimisation for socially responsible investors. The amended MV model allows SR investors to enter markets with competitive SR portfolios despite facing a trade-off between their investment Sharpe Ratio and the average ESG score of the portfolio.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem
Authors:
Sonia Bullah,
Terence L. van Zyl
Abstract:
Multi-objective portfolio optimisation is a critical problem researched across various fields of study as it achieves the objective of maximising the expected return while minimising the risk of a given portfolio at the same time. However, many studies fail to include realistic constraints in the model, which limits practical trading strategies. This study introduces realistic constraints, such as…
▽ More
Multi-objective portfolio optimisation is a critical problem researched across various fields of study as it achieves the objective of maximising the expected return while minimising the risk of a given portfolio at the same time. However, many studies fail to include realistic constraints in the model, which limits practical trading strategies. This study introduces realistic constraints, such as transaction and holding costs, into an optimisation model. Due to the non-convex nature of this problem, metaheuristic algorithms, such as NSGA-II, R-NSGA-II, NSGA-III and U-NSGA-III, will play a vital role in solving the problem. Furthermore, a learnheuristic approach is taken as surrogate models enhance the metaheuristics employed. These algorithms are then compared to the baseline metaheuristic algorithms, which solve a constrained, multi-objective optimisation problem without using learnheuristics. The results of this study show that, despite taking significantly longer to run to completion, the learnheuristic algorithms outperform the baseline algorithms in terms of hypervolume and rate of convergence. Furthermore, the backtesting results indicate that utilising learnheuristics to generate weights for asset allocation leads to a lower risk percentage, higher expected return and higher Sharpe ratio than backtesting without using learnheuristics. This leads us to conclude that using learnheuristics to solve a constrained, multi-objective portfolio optimisation problem produces superior and preferable results than solving the problem without using learnheuristics.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Late Meta-learning Fusion Using Representation Learning for Time Series Forecasting
Authors:
Terence L. van Zyl
Abstract:
Meta-learning, decision fusion, hybrid models, and representation learning are topics of investigation with significant traction in time-series forecasting research. Of these two specific areas have shown state-of-the-art results in forecasting: hybrid meta-learning models such as Exponential Smoothing - Recurrent Neural Network (ES-RNN) and Neural Basis Expansion Analysis (N-BEATS) and feature-ba…
▽ More
Meta-learning, decision fusion, hybrid models, and representation learning are topics of investigation with significant traction in time-series forecasting research. Of these two specific areas have shown state-of-the-art results in forecasting: hybrid meta-learning models such as Exponential Smoothing - Recurrent Neural Network (ES-RNN) and Neural Basis Expansion Analysis (N-BEATS) and feature-based stacking ensembles such as Feature-based FORecast Model Averaging (FFORMA). However, a unified taxonomy for model fusion and an empirical comparison of these hybrid and feature-based stacking ensemble approaches is still missing. This study presents a unified taxonomy encompassing these topic areas. Furthermore, the study empirically evaluates several model fusion approaches and a novel combination of hybrid and feature stacking algorithms called Deep-learning FORecast Model Averaging (DeFORMA). The taxonomy contextualises the considered methods. Furthermore, the empirical analysis of the results shows that the proposed model, DeFORMA, can achieve state-of-the-art results in the M4 data set. DeFORMA, increases the mean Overall Weighted Average (OWA) in the daily, weekly and yearly subsets with competitive results in the hourly, monthly and quarterly subsets. The taxonomy and empirical results lead us to argue that significant progress is still to be made by continuing to explore the intersection of these research areas.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Towards a methodology for addressing missingness in datasets, with an application to demographic health datasets
Authors:
Gift Khangamwa,
Terence L. van Zyl,
Clint J. van Alten
Abstract:
Missing data is a common concern in health datasets, and its impact on good decision-making processes is well documented. Our study's contribution is a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods to resolve missing data challenges. Specifically, we conducted a series of experiments with these…
▽ More
Missing data is a common concern in health datasets, and its impact on good decision-making processes is well documented. Our study's contribution is a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods to resolve missing data challenges. Specifically, we conducted a series of experiments with these objectives; $a)$ generating a realistic synthetic dataset, $b)$ simulating data missingness, $c)$ recovering the missing data, and $d)$ analyzing imputation performance. Our methodology used a gaussian mixture model whose parameters were learned from a cleaned subset of a real demographic and health dataset to generate the synthetic data. We simulated various missingness degrees ranging from $10 \%$, $20 \%$, $30 \%$, and $40\%$ under the missing completely at random scheme MCAR. We used an integrated performance analysis framework involving clustering, classification and direct imputation analysis. Our results show that models trained on synthetic and imputed datasets could make predictions with an accuracy of $83 \%$ and $80 \%$ on $a) $ an unseen real dataset and $b)$ an unseen reserved synthetic test dataset, respectively. Moreover, the models that used the DAE method for imputed yielded the lowest log loss an indication of good performance, even though the accuracy measures were slightly lower. In conclusion, our work demonstrates that using our methodology, one can reverse engineer a solution to resolve missingness on an unseen dataset with missingness. Moreover, though we used a health dataset, our methodology can be utilized in other contexts.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
Improving Cause-of-Death Classification from Verbal Autopsy Reports
Authors:
Thokozile Manaka,
Terence van Zyl,
Deepak Kar
Abstract:
In many lower-and-middle income countries including South Africa, data access in health facilities is restricted due to patient privacy and confidentiality policies. Further, since clinical data is unique to individual institutions and laboratories, there are insufficient data annotation standards and conventions. As a result of the scarcity of textual data, natural language processing (NLP) techn…
▽ More
In many lower-and-middle income countries including South Africa, data access in health facilities is restricted due to patient privacy and confidentiality policies. Further, since clinical data is unique to individual institutions and laboratories, there are insufficient data annotation standards and conventions. As a result of the scarcity of textual data, natural language processing (NLP) techniques have fared poorly in the health sector. A cause of death (COD) is often determined by a verbal autopsy (VA) report in places without reliable death registration systems. A non-clinician field worker does a VA report using a set of standardized questions as a guide to uncover symptoms of a COD. This analysis focuses on the textual part of the VA report as a case study to address the challenge of adapting NLP techniques in the health domain. We present a system that relies on two transfer learning paradigms of monolingual learning and multi-source domain adaptation to improve VA narratives for the target task of the COD classification. We use the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pre-trained on the general English and health domains to extract features from the VA narratives. Our findings suggest that this transfer learning system improves the COD classification tasks and that the narrative text contains valuable information for figuring out a COD. Our results further show that combining binary VA features and narrative text features learned via this framework boosts the classification task of COD.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Exploring the effectiveness of surrogate-assisted evolutionary algorithms on the batch processing problem
Authors:
Mohamed Z. Variawa,
Terence L. Van Zyl,
Matthew Woolway
Abstract:
Real-world optimisation problems typically have objective functions which cannot be expressed analytically. These optimisation problems are evaluated through expensive physical experiments or simulations. Cheap approximations of the objective function can reduce the computational requirements for solving these expensive optimisation problems. These cheap approximations may be machine learning or s…
▽ More
Real-world optimisation problems typically have objective functions which cannot be expressed analytically. These optimisation problems are evaluated through expensive physical experiments or simulations. Cheap approximations of the objective function can reduce the computational requirements for solving these expensive optimisation problems. These cheap approximations may be machine learning or statistical models and are known as surrogate models. This paper introduces a simulation of a well-known batch processing problem in the literature. Evolutionary algorithms such as Genetic Algorithm (GA), Differential Evolution (DE) are used to find the optimal schedule for the simulation. We then compare the quality of solutions obtained by the surrogate-assisted versions of the algorithms against the baseline algorithms. Surrogate-assistance is achieved through Probablistic Surrogate-Assisted Framework (PSAF). The results highlight the potential for improving baseline evolutionary algorithms through surrogates. For different time horizons, the solutions are evaluated with respect to several quality indicators. It is shown that the PSAF assisted GA (PSAF-GA) and PSAF-assisted DE (PSAF-DE) provided improvement in some time horizons. In others, they either maintained the solutions or showed some deterioration. The results also highlight the need to tune the hyper-parameters used by the surrogate-assisted framework, as the surrogate, in some instances, shows some deterioration over the baseline algorithm.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Volatility forecasting using Deep Learning and sentiment analysis
Authors:
V Ncume,
T. L van Zyl,
A Paskaramoorthy
Abstract:
Several studies have shown that deep learning models can provide more accurate volatility forecasts than the traditional methods used within this domain. This paper presents a composite model that merges a deep learning approach with sentiment analysis for predicting market volatility. To classify public sentiment, we use a Convolutional Neural Network, which obtained data from Reddit global news…
▽ More
Several studies have shown that deep learning models can provide more accurate volatility forecasts than the traditional methods used within this domain. This paper presents a composite model that merges a deep learning approach with sentiment analysis for predicting market volatility. To classify public sentiment, we use a Convolutional Neural Network, which obtained data from Reddit global news headlines. We then describe a composite forecasting model, a Long-Short-Term-Memory Neural Network method, to use historical sentiment and the previous day's volatility to make forecasts. We employed this method on the past volatility of the S&P500 and the major BRICS indices to corroborate its effectiveness. Our results demonstrate that including sentiment can improve Deep Learning volatility forecasting models. However, in contrast to return forecasting, the performance benefits of including sentiment appear for volatility forecasting appears to be market specific.
△ Less
Submitted 17 November, 2022; v1 submitted 22 October, 2022;
originally announced October 2022.
-
Multi-Modal Recommendation System with Auxiliary Information
Authors:
Mufhumudzi Muthivhi,
Terence L. van Zyl,
Hairong Wang
Abstract:
Context-aware recommendation systems improve upon classical recommender systems by including, in the modelling, a user's behaviour. Research into context-aware recommendation systems has previously only considered the sequential ordering of items as contextual information. However, there is a wealth of unexploited additional multi-modal information available in auxiliary knowledge related to items…
▽ More
Context-aware recommendation systems improve upon classical recommender systems by including, in the modelling, a user's behaviour. Research into context-aware recommendation systems has previously only considered the sequential ordering of items as contextual information. However, there is a wealth of unexploited additional multi-modal information available in auxiliary knowledge related to items. This study extends the existing research by evaluating a multi-modal recommendation system that exploits the inclusion of comprehensive auxiliary knowledge related to an item. The empirical results explore extracting vector representations (embeddings) from unstructured and structured data using data2vec. The fused embeddings are then used to train several state-of-the-art transformer architectures for sequential user-item representations. The analysis of the experimental results shows a statistically significant improvement in prediction accuracy, which confirms the effectiveness of including auxiliary information in a context-aware recommendation system. We report a 4% and 11% increase in the NDCG score for long and short user sequence datasets, respectively.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Pareto Driven Surrogate (ParDen-Sur) Assisted Optimisation of Multi-period Portfolio Backtest Simulations
Authors:
Terence L. van Zyl,
Matthew Woolway,
Andrew Paskaramoorthy
Abstract:
Portfolio management is a multi-period multi-objective optimisation problem subject to a wide range of constraints. However, in practice, portfolio management is treated as a single-period problem partly due to the computationally burdensome hyper-parameter search procedure needed to construct a multi-period Pareto frontier. This study presents the \gls{ParDen-Sur} modelling framework to efficient…
▽ More
Portfolio management is a multi-period multi-objective optimisation problem subject to a wide range of constraints. However, in practice, portfolio management is treated as a single-period problem partly due to the computationally burdensome hyper-parameter search procedure needed to construct a multi-period Pareto frontier. This study presents the \gls{ParDen-Sur} modelling framework to efficiently perform the required hyper-parameter search. \gls{ParDen-Sur} extends previous surrogate frameworks by including a reservoir sampling-based look-ahead mechanism for offspring generation in \glspl{EA} alongside the traditional acceptance sampling scheme. We evaluate this framework against, and in conjunction with, several seminal \gls{MO} \glspl{EA} on two datasets for both the single- and multi-period use cases. Our results show that \gls{ParDen-Sur} can speed up the exploration for optimal hyper-parameters by almost $2\times$ with a statistically significant improvement of the Pareto frontiers, across multiple \glspl{EA}, for both datasets and use cases.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Knowledge Graph Fusion for Language Model Fine-tuning
Authors:
Nimesh Bhana,
Terence L. van Zyl
Abstract:
Language Models such as BERT have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they can produce semantic representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational r…
▽ More
Language Models such as BERT have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they can produce semantic representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational requirements and lack global context or domain knowledge which is required for complete language understanding. To address these limitations, we investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT. An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language and extended to inject contextually relevant information into sentences. As a side-effect, changes made to K-BERT for accommodating the English language also extend to other word-based languages. Experiments conducted indicate that injected knowledge introduces noise. We see statistically significant improvements for knowledge-driven tasks when this noise is minimised. We show evidence that, given the appropriate task, modest injection with relevant, high-quality knowledge is most performant.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Surrogate Assisted Evolutionary Multi-objective Optimisation applied to a Pressure Swing Adsorption system
Authors:
Liezl Stander,
Matthew Woolway,
Terence L. Van Zyl
Abstract:
Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation…
▽ More
Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation. This paper extends recent research into optimising chemical plant design and operation. The study further explores Surrogate Assisted Genetic Algorithms (SA-GA) in more complex variants of the original plant design and optimisation problems, such as the inclusion of parallel and feedback components. The novel extension to the original algorithm proposed in this study, Surrogate Assisted NSGA-\Romannum{2} (SA-NSGA), was tested on a popular literature case, the Pressure Swing Adsorption (PSA) system. We further provide extensive experimentation, comparing various meta-heuristic optimisation techniques and numerous machine learning models as surrogates. The results for both sets of systems illustrate the benefits of using Genetic Algorithms as an optimisation framework for complex chemical plant system design and optimisation for both single and multi-objective scenarios. We confirm that Random Forest surrogate assisted Evolutionary Algorithms can be scaled to increasingly complex chemical systems with parallel and feedback components. We further find that combining a Genetic Algorithm framework with Machine Learning Surrogate models as a substitute for long-running simulation models yields significant computational efficiency improvements, 1.7 - 1.84 times speedup for the increased complexity examples and a 2.7 times speedup for the Pressure Swing Adsorption system.
△ Less
Submitted 28 March, 2022;
originally announced April 2022.
-
Using Machine Learning to Fuse Verbal Autopsy Narratives and Binary Features in the Analysis of Deaths from Hyperglycaemia
Authors:
Thokozile Manaka,
Terence Van Zyl,
Alisha N Wade,
Deepak Kar
Abstract:
Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of a…
▽ More
Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an open-ended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task.
Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine Learning, Natural Language Processing
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management
Authors:
Ruan Pretorius,
Terence van Zyl
Abstract:
Traditional portfolio management methods can incorporate specific investor preferences but rely on accurate forecasts of asset returns and covariances. Reinforcement learning (RL) methods do not rely on these explicit forecasts and are better suited for multi-stage decision processes. To address limitations of the evaluated research, experiments were conducted on three markets in different economi…
▽ More
Traditional portfolio management methods can incorporate specific investor preferences but rely on accurate forecasts of asset returns and covariances. Reinforcement learning (RL) methods do not rely on these explicit forecasts and are better suited for multi-stage decision processes. To address limitations of the evaluated research, experiments were conducted on three markets in different economies with different overall trends. By incorporating specific investor preferences into our RL models' reward functions, a more comprehensive comparison could be made to traditional methods in risk-return space. Transaction costs were also modelled more realistically by including nonlinear changes introduced by market volatility and trading volume. The results of this study suggest that there can be an advantage to using RL methods compared to traditional convex mean-variance optimisation methods under certain market conditions. Our RL models could significantly outperform traditional single-period optimisation (SPO) and multi-period optimisation (MPO) models in upward trending markets, but only up to specific risk limits. In sideways trending markets, the performance of SPO and MPO models can be closely matched by our RL models for the majority of the excess risk range tested. The specific market conditions under which these models could outperform each other highlight the importance of a more comprehensive comparison of Pareto optimal frontiers in risk-return space. These frontiers give investors a more granular view of which models might provide better performance for their specific risk tolerance or return targets.
△ Less
Submitted 13 February, 2022;
originally announced March 2022.
-
Fusion of Sentiment and Asset Price Predictions for Portfolio Optimization
Authors:
Mufhumudzi Muthivhi,
Terence L. van Zyl
Abstract:
The fusion of public sentiment data in the form of text with stock price prediction is a topic of increasing interest within the financial community. However, the research literature seldom explores the application of investor sentiment in the Portfolio Selection problem. This paper aims to unpack and develop an enhanced understanding of the sentiment aware portfolio selection problem. To this end…
▽ More
The fusion of public sentiment data in the form of text with stock price prediction is a topic of increasing interest within the financial community. However, the research literature seldom explores the application of investor sentiment in the Portfolio Selection problem. This paper aims to unpack and develop an enhanced understanding of the sentiment aware portfolio selection problem. To this end, the study uses a Semantic Attention Model to predict sentiment towards an asset. We select the optimal portfolio through a sentiment-aware Long Short Term Memory (LSTM) recurrent neural network for price prediction and a mean-variance strategy. Our sentiment portfolio strategies achieved on average a significant increase in revenue above the non-sentiment aware models. However, the results show that our strategy does not outperform traditional portfolio allocation strategies from a stability perspective. We argue that an improved fusion of sentiment prediction with a combination of price prediction and portfolio optimization leads to an enhanced portfolio selection strategy.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion
Authors:
Pieter Cawood,
Terence van Zyl
Abstract:
Techniques of hybridisation and ensemble learning are popular model fusion techniques for improving the predictive power of forecasting methods. With limited research that instigates combining these two promising approaches, this paper focuses on the utility of the Exponential-Smoothing-Recurrent Neural Network (ES-RNN) in the pool of base models for different ensembles. We compare against some st…
▽ More
Techniques of hybridisation and ensemble learning are popular model fusion techniques for improving the predictive power of forecasting methods. With limited research that instigates combining these two promising approaches, this paper focuses on the utility of the Exponential-Smoothing-Recurrent Neural Network (ES-RNN) in the pool of base models for different ensembles. We compare against some state of the art ensembling techniques and arithmetic model averaging as a benchmark. We experiment with the M4 forecasting data set of 100,000 time-series, and the results show that the Feature-based Forecast Model Averaging (FFORMA), on average, is the best technique for late data fusion with the ES-RNN. However, considering the M4's Daily subset of data, stacking was the only successful ensemble at dealing with the case where all base model performances are similar. Our experimental results indicate that we attain state of the art forecasting results compared to N-BEATS as a benchmark. We conclude that model averaging is a more robust ensemble than model selection and stacking strategies. Further, the results show that gradient boosting is superior for implementing ensemble learning strategies.
△ Less
Submitted 19 July, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Statistics and Deep Learning-based Hybrid Model for Interpretable Anomaly Detection
Authors:
Thabang Mathonsi,
Terence L van Zyl
Abstract:
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at both forecasting tasks, and at quantifying the uncertainty associated with those forecasts (prediction intervals). One example is Multivariate Exponential Smoothing Long Short-Term Memory (MES-LSTM), a hybrid between a multivariate statistical forecasting model and a Recurrent Neural Network variant, Lo…
▽ More
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at both forecasting tasks, and at quantifying the uncertainty associated with those forecasts (prediction intervals). One example is Multivariate Exponential Smoothing Long Short-Term Memory (MES-LSTM), a hybrid between a multivariate statistical forecasting model and a Recurrent Neural Network variant, Long Short-Term Memory. It has also been shown that a model that ($i$) produces accurate forecasts and ($ii$) is able to quantify the associated predictive uncertainty satisfactorily, can be successfully adapted to a model suitable for anomaly detection tasks. With the increasing ubiquity of multivariate data and new application domains, there have been numerous anomaly detection methods proposed in recent years. The proposed methods have largely focused on deep learning techniques, which are prone to suffer from challenges such as ($i$) large sets of parameters that may be computationally intensive to tune, $(ii)$ returning too many false positives rendering the techniques impractical for use, $(iii)$ requiring labeled datasets for training which are often not prevalent in real life, and ($iv$) understanding of the root causes of anomaly occurrences inhibited by the predominantly black-box nature of deep learning methods. In this article, an extension of MES-LSTM is presented, an interpretable anomaly detection model that overcomes these challenges. With a focus on renewable energy generation as an application domain, the proposed approach is benchmarked against the state-of-the-art. The findings are that MES-LSTM anomaly detector is at least competitive to the benchmarks at anomaly detection tasks, and less prone to learning from spurious effects than the benchmarks, thus making it more reliable at root cause discovery and explanation.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Investigating Transfer Learning in Graph Neural Networks
Authors:
Nishai Kooverjee,
Steven James,
Terence van Zyl
Abstract:
Graph neural networks (GNNs) build on the success of deep learning models by extending them for use in graph spaces. Transfer learning has proven extremely successful for traditional deep learning problems: resulting in faster training and improved performance. Despite the increasing interest in GNNs and their use cases, there is little research on their transferability. This research demonstrates…
▽ More
Graph neural networks (GNNs) build on the success of deep learning models by extending them for use in graph spaces. Transfer learning has proven extremely successful for traditional deep learning problems: resulting in faster training and improved performance. Despite the increasing interest in GNNs and their use cases, there is little research on their transferability. This research demonstrates that transfer learning is effective with GNNs, and describes how source tasks and the choice of GNN impact the ability to learn generalisable knowledge. We perform experiments using real-world and synthetic data within the contexts of node classification and graph classification. To this end, we also provide a general methodology for transfer learning experimentation and present a novel algorithm for generating synthetic graph classification tasks. We compare the performance of GCN, GraphSAGE and GIN across both the synthetic and real-world datasets. Our results demonstrate empirically that GNNs with inductive operations yield statistically significantly improved transfer. Further we show that similarity in community structure between source and target tasks support statistically significant improvements in transfer over and above the use of only the node attributes.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
A Statistics and Deep Learning Hybrid Method for Multivariate Time Series Forecasting and Mortality Modeling
Authors:
Thabang Mathonsi,
Terence L. van Zyl
Abstract:
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at forecasting tasks and quantifying the associated uncertainty with those forecasts (prediction intervals). One example is Exponential Smoothing Recurrent Neural Network (ES-RNN), a hybrid between a statistical forecasting model and a recurrent neural network variant. ES-RNN achieves a 9.4\% improvement i…
▽ More
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at forecasting tasks and quantifying the associated uncertainty with those forecasts (prediction intervals). One example is Exponential Smoothing Recurrent Neural Network (ES-RNN), a hybrid between a statistical forecasting model and a recurrent neural network variant. ES-RNN achieves a 9.4\% improvement in absolute error in the Makridakis-4 Forecasting Competition. This improvement and similar outperformance from other hybrid models have primarily been demonstrated only on univariate datasets. Difficulties with applying hybrid forecast methods to multivariate data include ($i$) the high computational cost involved in hyperparameter tuning for models that are not parsimonious, ($ii$) challenges associated with auto-correlation inherent in the data, as well as ($iii$) complex dependency (cross-correlation) between the covariates that may be hard to capture. This paper presents Multivariate Exponential Smoothing Long Short Term Memory (MES-LSTM), a generalized multivariate extension to ES-RNN, that overcomes these challenges. MES-LSTM utilizes a vectorized implementation. We test MES-LSTM on several aggregated coronavirus disease of 2019 (COVID-19) morbidity datasets and find our hybrid approach shows consistent, significant improvement over pure statistical and deep learning methods at forecast accuracy and prediction interval construction.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Multivariate Anomaly Detection based on Prediction Intervals Constructed using Deep Learning
Authors:
Thabang Mathonsi,
Terence L. van Zyl
Abstract:
It has been shown that deep learning models can under certain circumstances outperform traditional statistical methods at forecasting. Furthermore, various techniques have been developed for quantifying the forecast uncertainty (prediction intervals). In this paper, we utilize prediction intervals constructed with the aid of artificial neural networks to detect anomalies in the multivariate settin…
▽ More
It has been shown that deep learning models can under certain circumstances outperform traditional statistical methods at forecasting. Furthermore, various techniques have been developed for quantifying the forecast uncertainty (prediction intervals). In this paper, we utilize prediction intervals constructed with the aid of artificial neural networks to detect anomalies in the multivariate setting. Challenges with existing deep learning-based anomaly detection approaches include $(i)$ large sets of parameters that may be computationally intensive to tune, $(ii)$ returning too many false positives rendering the techniques impractical for use, $(iii)$ requiring labeled datasets for training which are often not prevalent in real life. Our approach overcomes these challenges. We benchmark our approach against the oft-preferred well-established statistical models. We focus on three deep learning architectures, namely, cascaded neural networks, reservoir computing and long short-term memory recurrent neural networks. Our finding is deep learning outperforms (or at the very least is competitive to) the latter.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Incremental Class Learning using Variational Autoencoders with Similarity Learning
Authors:
Jiahao Huo,
Terence L. van Zyl
Abstract:
Catastrophic forgetting in neural networks during incremental learning remains a challenging problem. Previous research investigated catastrophic forgetting in fully connected networks, with some earlier work exploring activation functions and learning algorithms. Applications of neural networks have been extended to include similarity learning. Understanding how similarity learning loss functions…
▽ More
Catastrophic forgetting in neural networks during incremental learning remains a challenging problem. Previous research investigated catastrophic forgetting in fully connected networks, with some earlier work exploring activation functions and learning algorithms. Applications of neural networks have been extended to include similarity learning. Understanding how similarity learning loss functions would be affected by catastrophic forgetting is of significant interest. Our research investigates catastrophic forgetting for four well-known similarity-based loss functions during incremental class learning. The loss functions are Angular, Contrastive, Center, and Triplet loss. Our results show that the catastrophic forgetting rate differs across loss functions on multiple datasets. The Angular loss was least affected, followed by Contrastive, Triplet loss, and Center loss with good mining techniques. We implemented three existing incremental learning techniques, iCaRL, EWC, and EBLL. We further proposed a novel technique using Variational Autoencoders (VAEs) to generate representation as exemplars passed through the network's intermediate layers. Our method outperformed three existing state-of-the-art techniques. We show that one does not require stored images (exemplars) for incremental learning with similarity learning. The generated representations from VAEs help preserve regions of the embedding space used by prior knowledge so that new knowledge does not ``overwrite'' it.
△ Less
Submitted 14 March, 2023; v1 submitted 4 October, 2021;
originally announced October 2021.
-
AMA-K: Aggressive Multi-Temporal Allocation An Algorithm for Aggressive Online Portfolio Selection
Authors:
Matthew Kruger,
Terence L. van Zyl,
Andrew Paskaramoorthy
Abstract:
Online portfolio selection is an integral componentof wealth management. The fundamental undertaking is tomaximise returns while minimising risk given investor con-straints. We aim to examine and improve modern strategiesto generate higher returns in a variety of market conditions.By integrating simple data mining, optimisation techniques andmachine learning procedures, we aim to generate aggressi…
▽ More
Online portfolio selection is an integral componentof wealth management. The fundamental undertaking is tomaximise returns while minimising risk given investor con-straints. We aim to examine and improve modern strategiesto generate higher returns in a variety of market conditions.By integrating simple data mining, optimisation techniques andmachine learning procedures, we aim to generate aggressive andconsistent high yield portfolios. This leads to a new methodologyof Pattern-Matching that may yield further advances in dynamicand competitive portfolio construction. The resulting strategiesoutperform a variety of benchmarks, when compared using Max-imum Drawdown, Annualised Percentage Yield and AnnualisedSharpe Ratio, that make use of similar approaches. The proposedstrategy returns showcase acceptable risk with high reward thatperforms well in a variety of market conditions. We concludethat our algorithm provides an improvement in searching foroptimal portfolios compared to existing methods.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Surrogate Parameters Optimization for Data and Model Fusion of COVID-19 Time-series Data
Authors:
Timilehin Ogundare,
Terence Van Zyl
Abstract:
Our research focuses on develo** a computational framework to simulate the transmission dynamics of COVID-19 pandemic. We examine the development of a system named ADRIANA for the simulation using South Africa as a case study. The design of the ADRIANA system interconnects three sub-models to establish a computational technique to advise policy regarding lockdown measures to reduce the transmiss…
▽ More
Our research focuses on develo** a computational framework to simulate the transmission dynamics of COVID-19 pandemic. We examine the development of a system named ADRIANA for the simulation using South Africa as a case study. The design of the ADRIANA system interconnects three sub-models to establish a computational technique to advise policy regarding lockdown measures to reduce the transmission pattern of COVID-19 in South Africa. Additionally, the output of the ADRIANA can be used by healthcare administration to predict peak demand time for resources needed to treat infected individuals. ABM is suited for our research experiment, but to prevent the computational constraints of using ABM-based framework for this research, we develop an SEIR compartmental model, a discrete event simulator, and an optimized surrogate model to form a system named ADRIANA. We also ensure that the surrogate's findings are accurate enough to provide optimal solutions. We use the Genetic Algorithm (GA) for the optimization by estimating the optimal hyperparameter configuration for the surrogate. We concluded this study by discussing the solutions presented by the ADRIANA system, which aligns with the primary goal of our study to present an optimal guide to lockdown policy by the government and resource management by the hospital administrators.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)
Authors:
Rylan Perumal,
Terence L van Zyl
Abstract:
Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the \say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propo…
▽ More
Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the \say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propose a more comprehensive and adaptive ABMS Framework that can effectively swap out parameterisation strategies and surrogate models to parameterise an infectious disease ABM. This framework allows us to evaluate different strategy-surrogate combinations' performance in accuracy and efficiency (speedup). We show that we achieve better than parity in accuracy across the surrogate assisted sampling strategies and the baselines. Also, we identify that the Metric Stochastic Response Surface strategy combined with the Support Vector Machine surrogate is the best overall in getting closest to the true synthetic parameters. Also, we show that DYnamic COOrdindate Search Using Response Surface Models with XGBoost as a surrogate attains in combination the highest probability of approximating a cumulative synthetic daily infection data distribution and achieves the most significant speedup with regards to our analysis. Lastly, we show in a real-world setting that DYCORS XGBoost and MSRS SVM can approximate the real world cumulative daily infection distribution with $97.12$\% and $96.75$\% similarity respectively.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves
Authors:
Pieter Cawood,
Terence L. van Zyl
Abstract:
We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series similar to those in the early days of the COVID-19 pandemic. Develo** improved forecast methods is essential as they provide data-driven decisions to organisations and decision-makers during critical phases. We propose using late data fusion, using a stacked ensemble of two forecast…
▽ More
We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series similar to those in the early days of the COVID-19 pandemic. Develo** improved forecast methods is essential as they provide data-driven decisions to organisations and decision-makers during critical phases. We propose using late data fusion, using a stacked ensemble of two forecasting models and two meta-features that prove their predictive power during a preliminary forecasting stage. The final ensembles include a Prophet and long short term memory (LSTM) neural network as base models. The base models are combined by a multilayer perceptron (MLP), taking into account meta-features that indicate the highest correlation with each base model's forecast accuracy. We further show that the inclusion of meta-features generally improves the ensemble's forecast accuracy across two forecast horizons of seven and fourteen days. This research reinforces previous work and demonstrates the value of combining traditional statistical models with deep learning models to produce more accurate forecast models for time-series from different domains and seasonality.
△ Less
Submitted 5 December, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
ParDen: Surrogate Assisted Hyper-Parameter Optimisation for Portfolio Selection
Authors:
Terence van Zyl,
Matthew Woolway,
Andrew Paskaramoorthy
Abstract:
Portfolio optimisation is a multi-objective optimisation problem (MOP), where an investor aims to optimise the conflicting criteria of maximising a portfolio's expected return whilst minimising its risk and other costs. However, selecting a portfolio is a computationally expensive problem because of the cost associated with performing multiple evaluations on test data ("backtesting") rather than s…
▽ More
Portfolio optimisation is a multi-objective optimisation problem (MOP), where an investor aims to optimise the conflicting criteria of maximising a portfolio's expected return whilst minimising its risk and other costs. However, selecting a portfolio is a computationally expensive problem because of the cost associated with performing multiple evaluations on test data ("backtesting") rather than solving the convex optimisation problem itself. In this research, we present ParDen, an algorithm for the inclusion of any discriminative or generative machine learning model as a surrogate to mitigate the computationally expensive backtest procedure. In addition, we compare the performance of alternative metaheuristic algorithms: NSGA-II, R-NSGA-II, NSGA-III, R-NSGA-III, U-NSGA-III, MO-CMA-ES, and COMO-CMA-ES. We measure performance using multi-objective performance indicators, including Generational Distance Plus, Inverted Generational Distance Plus and Hypervolume. We also consider meta-indicators, Success Rate and Average Executions to Success Rate, of the Hypervolume to provide more insight into the quality of solutions. Our results show that ParDen can reduce the number of evaluations required by almost a third while obtaining an improved Pareto front over the state-of-the-art for the problem of portfolio selection.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Deep Similarity Learning for Sports Team Ranking
Authors:
Daniel Yazbek,
Jonathan Sandile Sibindi,
Terence L. Van Zyl
Abstract:
Sports data is more readily available and consequently, there has been an increase in the amount of sports analysis, predictions and rankings in the literature. Sports are unique in their respective stochastic nature, making analysis, and accurate predictions valuable to those involved in the sport. In response, we focus on Siamese Neural Networks (SNN) in unison with LightGBM and XGBoost models,…
▽ More
Sports data is more readily available and consequently, there has been an increase in the amount of sports analysis, predictions and rankings in the literature. Sports are unique in their respective stochastic nature, making analysis, and accurate predictions valuable to those involved in the sport. In response, we focus on Siamese Neural Networks (SNN) in unison with LightGBM and XGBoost models, to predict the importance of matches and to rank teams in Rugby and Basketball. Six models were developed and compared, a LightGBM, a XGBoost, a LightGBM (Contrastive Loss), LightGBM (Triplet Loss), a XGBoost (Contrastive Loss), XGBoost (Triplet Loss). The models that utilise a Triplet loss function perform better than those using Contrastive loss. It is clear LightGBM (Triplet loss) is the most effective model in ranking the NBA, producing a state of the art (SOTA) mAP (0.867) and NDCG (0.98) respectively. The SNN (Triplet loss) most effectively predicted the Super 15 Rugby, yielding the SOTA mAP (0.921), NDCG (0.983), and $r_s$ (0.793). Triplet loss produces the best overall results displaying the value of learning representations/embeddings for prediction and ranking of sports. Overall there is not a single consistent best performing model across the two sports indicating that other Ranking models should be considered in the future.
△ Less
Submitted 16 February, 2022; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Surrogate Assisted Methods for the Parameterisation of Agent-Based Models
Authors:
Rylan Perumal,
Terence L van Zyl
Abstract:
Parameter calibration is a major challenge in agent-based modelling and simulation (ABMS). As the complexity of agent-based models (ABMs) increase, the number of parameters required to be calibrated grows. This leads to the ABMS equivalent of the \say{curse of dimensionality}. We propose an ABMS framework which facilitates the effective integration of different sampling methods and surrogate model…
▽ More
Parameter calibration is a major challenge in agent-based modelling and simulation (ABMS). As the complexity of agent-based models (ABMs) increase, the number of parameters required to be calibrated grows. This leads to the ABMS equivalent of the \say{curse of dimensionality}. We propose an ABMS framework which facilitates the effective integration of different sampling methods and surrogate models (SMs) in order to evaluate how these strategies affect parameter calibration and exploration. We show that surrogate assisted methods perform better than the standard sampling methods. In addition, we show that the XGBoost and Decision Tree SMs are most optimal overall with regards to our analysis.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Unique Faces Recognition in Videos
Authors:
Jiahao Huo,
Terence L van Zyl
Abstract:
This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make…
▽ More
This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make use of still images and sequences from videos for training the networks and compare the performances implementing the above architectures. The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos. The contribution of this paper is two-fold: to begin, the experiments have established 3-D Convolutional networks and 2-D LSTMs with the contrastive loss on image sequences do not outperform Google/Inception architecture with contrastive loss in top $n$ rank face retrievals with still images. However, the 3-D Convolution networks and 2-D LSTM with triplet Loss outperform the Google/Inception with triplet loss in top $n$ rank face retrievals on the dataset; second, a Support Vector Machine (SVM) was used in conjunction with the CNNs' learned feature representations for facial identification. The results show that feature representation learned with triplet loss is significantly better for n-shot facial identification compared to contrastive loss. The most useful feature representations for facial identification are from the 2-D LSTM with triplet loss. The experiments show that learning spatio-temporal features from video sequences is beneficial for facial recognition in videos.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Comparison of Recurrent Neural Network Architectures for Wildfire Spread Modelling
Authors:
Rylan Perumal,
Terence L van Zyl
Abstract:
Wildfire modelling is an attempt to reproduce fire behaviour. Through active fire analysis, it is possible to reproduce a dynamical process, such as wildfires, with limited duration time series data. Recurrent neural networks (RNNs) can model dynamic temporal behaviour due to their ability to remember their internal input. In this paper, we compare the Gated Recurrent Unit (GRU) and the Long Short…
▽ More
Wildfire modelling is an attempt to reproduce fire behaviour. Through active fire analysis, it is possible to reproduce a dynamical process, such as wildfires, with limited duration time series data. Recurrent neural networks (RNNs) can model dynamic temporal behaviour due to their ability to remember their internal input. In this paper, we compare the Gated Recurrent Unit (GRU) and the Long Short-Term Memory (LSTM) network. We try to determine whether a wildfire continues to burn and given that it does, we aim to predict which one of the 8 cardinal directions the wildfire will spread in. Overall the GRU performs better for longer time series than the LSTM. We have shown that although we are reasonable at predicting the direction in which the wildfire will spread, we are not able to asses if the wildfire continues to burn due to the lack of auxiliary data.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition
Authors:
Nishai Kooverjee,
Steven James,
Terence van Zyl
Abstract:
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We p…
▽ More
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.