-
Data Extraction, Transformation, and Loading Process Automation for Algorithmic Trading Machine Learning Modelling and Performance Optimization
Authors:
Nassi Ebadifard,
Ajitesh Parihar,
Youry Khmelevsky,
Gaetan Hains,
Albert Wong,
Frank Zhang
Abstract:
A data warehouse efficiently prepares data for effective and fast data analysis and modelling using machine learning algorithms. This paper discusses existing solutions for the Data Extraction, Transformation, and Loading (ETL) process and automation for algorithmic trading algorithms. Integrating the Data Warehouses and, in the future, the Data Lakes with the Machine Learning Algorithms gives eno…
▽ More
A data warehouse efficiently prepares data for effective and fast data analysis and modelling using machine learning algorithms. This paper discusses existing solutions for the Data Extraction, Transformation, and Loading (ETL) process and automation for algorithmic trading algorithms. Integrating the Data Warehouses and, in the future, the Data Lakes with the Machine Learning Algorithms gives enormous opportunities in research when performance and data processing time become critical non-functional requirements.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Translating Natural Language Queries to SQL Using the T5 Model
Authors:
Albert Wong,
Lien Pham,
Young Lee,
Shek Chan,
Razel Sadaya,
Youry Khmelevsky,
Mathias Clement,
Florence Wing Yau Cheng,
Joe Mahony,
Michael Ferri
Abstract:
This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used s…
▽ More
This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used successfully on a daily basis. The approach used in the model development could be implemented in a similar fashion for other database environments and with a more powerful pre-trained language model.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Short-Term Stock Price Forecasting using exogenous variables and Machine Learning Algorithms
Authors:
Albert Wong,
Steven Whang,
Emilio Sagre,
Niha Sachin,
Gustavo Dutra,
Yew-Wei Lim,
Gaetan Hains,
Youry Khmelevsky,
Frank Zhang
Abstract:
Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, R…
▽ More
Creating accurate predictions in the stock market has always been a significant challenge in finance. With the rise of machine learning as the next level in the forecasting area, this research paper compares four machine learning models and their accuracy in forecasting three well-known stocks traded in the NYSE in the short term from March 2020 to May 2022. We deploy, develop, and tune XGBoost, Random Forest, Multi-layer Perceptron, and Support Vector Regression models. We report the models that produce the highest accuracies from our evaluation metrics: RMSE, MAPE, MTT, and MPE. Using a training data set of 240 trading days, we find that XGBoost gives the highest accuracy despite running longer (up to 10 seconds). Results from this study may improve by further tuning the individual parameters or introducing more exogenous variables.
△ Less
Submitted 17 May, 2023;
originally announced September 2023.
-
Gamers Private Network Performance Forecasting. From Raw Data to the Data Warehouse with Machine Learning and Neural Nets
Authors:
Albert Wong,
Chun Yin Chiu,
GaƩtan Hains,
Jack Humphrey,
Hans Fuhrmann,
Youry Khmelevsky,
Chris Mazur
Abstract:
Gamers Private Network (GPN) is a client/server technology that guarantees a connection for online video games that is more reliable and lower latency than a standard internet connection. Users of the GPN technology benefit from a stable and high-quality gaming experience for online games, which are hosted and played across the world. After transforming a massive volume of raw networking data coll…
▽ More
Gamers Private Network (GPN) is a client/server technology that guarantees a connection for online video games that is more reliable and lower latency than a standard internet connection. Users of the GPN technology benefit from a stable and high-quality gaming experience for online games, which are hosted and played across the world. After transforming a massive volume of raw networking data collected by WTFast, we have structured the cleaned data into a special-purpose data warehouse and completed the extensive analysis using machine learning and neural nets technologies, and business intelligence tools. These analyses demonstrate the ability to predict and quantify changes in the network and demonstrate the benefits gained from the use of a GPN for users when connected to an online game session.
△ Less
Submitted 25 May, 2021;
originally announced July 2021.
-
Roof Damage Assessment from Automated 3D Building Models
Authors:
Kenichi Sugihara,
Martin Wallace,
Kongwen,
Zhang,
Youry Khmelevsky
Abstract:
The 3D building modelling is important in urban planning and related domains that draw upon the content of 3D models of urban scenes. Such 3D models can be used to visualize city images at multiple scales from individual buildings to entire cities prior to and after a change has occurred. This ability is of great importance in day-to-day work and special projects undertaken by planners, geo-design…
▽ More
The 3D building modelling is important in urban planning and related domains that draw upon the content of 3D models of urban scenes. Such 3D models can be used to visualize city images at multiple scales from individual buildings to entire cities prior to and after a change has occurred. This ability is of great importance in day-to-day work and special projects undertaken by planners, geo-designers, and architects. In this research, we implemented a novel approach to 3D building models for such matter, which included the integration of geographic information systems (GIS) and 3D Computer Graphics (3DCG) components that generate 3D house models from building footprints (polygons), and the automated generation of simple and complex roof geometries for rapid roof area damage reporting. These polygons (footprints) are usually orthogonal. A complicated orthogonal polygon can be partitioned into a set of rectangles. The proposed GIS and 3DCG integrated system partitions orthogonal building polygons into a set of rectangles and places rectangular roofs and box-shaped building bodies on these rectangles. Since technicians are drawing these polygons manually with digitizers, depending on aerial photos, not all building polygons are precisely orthogonal. But, when placing a set of boxes as building bodies for creating the buildings, there may be gaps or overlaps between these boxes if building polygons are not precisely orthogonal. In our proposal, after approximately orthogonal building polygons are partitioned and rectified into a set of mutually orthogonal rectangles, each rectangle knows which rectangle is adjacent to and which edge of the rectangle is adjacent to, which will avoid unwanted intersection of windows and doors when building bodies combined.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Students Programming Competitions as an Educational Tool and a Motivational Incentive to Students
Authors:
Youry Khmelevsky,
Ken Chidlow
Abstract:
In this short paper we report on student programming competition results by students from the Computer Science Department (COSC) of Okanagan College (OC) and discuss the achieved results from an educational point of view. We found that some freshmen and sophomore students in diploma and degree programs are very capable and eager to be involved in applied research projects as early as the second se…
▽ More
In this short paper we report on student programming competition results by students from the Computer Science Department (COSC) of Okanagan College (OC) and discuss the achieved results from an educational point of view. We found that some freshmen and sophomore students in diploma and degree programs are very capable and eager to be involved in applied research projects as early as the second semester, and into local and international programming competitions as well. Our observation is based on the last 2 educational years, beginning 2015 when we introduced programming competitions to COSC students. Students reported that participation in competitions give them motivation to effectively learn in their programming courses, inspire them to learn deeper and more thoroughly, and help them achieve better results in their classes.
△ Less
Submitted 31 May, 2021; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Parallel Programming Applied Research Projects for Teaching Parallel Programming to Beginner Students
Authors:
Youry Khmelevsky,
Gaetan J. D. R. Hains
Abstract:
In this paper, we discuss the educational value of a few mid-size and one large applied research projects at the Computer Science Department of Okanagan College (OC) and at the Universities of Paris East Creteil (LACL) and Orleans (LIFO) in France. We found, that some freshmen students are very active and eager to be involved in applied research projects starting from the second semester. They are…
▽ More
In this paper, we discuss the educational value of a few mid-size and one large applied research projects at the Computer Science Department of Okanagan College (OC) and at the Universities of Paris East Creteil (LACL) and Orleans (LIFO) in France. We found, that some freshmen students are very active and eager to be involved in applied research projects starting from the second semester. They are actively participating in programming competitions and want to be involved in applied research projects to compete with sophomore and older students. Our observation is based on five NSERC Engage College and Applied Research and Development (ARD) grants, and several small applied projects. Student involvement in applied research is a key motivation and success factor in our activities, but we are also involved in transferring some results of applied research, namely programming techniques, into the parallel programming courses for beginners at the senior- and first-year MSc levels. We illustrate this feedback process with programming notions for beginners, practical tools to acquire them and the overall success/failure of students as experienced for more than 10 years in our French University courses.
△ Less
Submitted 30 May, 2021; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Machine Learning Prediction of Gamer's Private Networks
Authors:
Chris Mazur,
Jesse Ayers,
Gaetan Hains,
Youry Khmelevsky
Abstract:
The Gamer's Private Network (GPN) is a client/server technology created by WTFast for making the network performance of online games faster and more reliable. GPN s use middle-mile servers and proprietary algorithms to better connect online video-game players to their game's servers across a wide-area network. Online games are a massive entertainment market and network latency is a key aspect of a…
▽ More
The Gamer's Private Network (GPN) is a client/server technology created by WTFast for making the network performance of online games faster and more reliable. GPN s use middle-mile servers and proprietary algorithms to better connect online video-game players to their game's servers across a wide-area network. Online games are a massive entertainment market and network latency is a key aspect of a player's competitive edge. This market means many different approaches to network architecture are implemented by different competing companies and that those architectures are constantly evolving. Ensuring the optimal connection between a client of WTFast and the online game they wish to play is thus an incredibly difficult problem to automate. Using machine learning, we analyzed historical network data from GPN connections to explore the feasibility of network latency prediction which is a key part of optimization. Our next step will be to collect live data (including client/server load, packet and port information and specific game state information) from GPN Minecraft servers and bots. We will use this information in a Reinforcement Learning model along with predictions about latency to alter the clients' and servers' configurations for optimal network performance. These investigations and experiments will improve the quality of service and reliability of GPN systems.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
State-of-the-Art on Query & Transaction Processing Acceleration
Authors:
Bernd Amann,
Youry Khmelevsky,
Gaetan Hains
Abstract:
The vast amount of processing power and memory bandwidth provided by modern Graphics Processing Units (GPUs) make them a platform for data-intensive applications. The database community identified GPUs as effective co-processors for data processing. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this Internal Technical Report, based o…
▽ More
The vast amount of processing power and memory bandwidth provided by modern Graphics Processing Units (GPUs) make them a platform for data-intensive applications. The database community identified GPUs as effective co-processors for data processing. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this Internal Technical Report, based on the [1] and some other research papers, we identify possible research areas at LIP6 for GPU-accelerated database management systems. We describe some key properties, typical challenges of GPU-aware database architectures, and identify major open challenges.
△ Less
Submitted 26 June, 2019;
originally announced July 2019.
-
Formal methods and software engineering for DL. Security, safety and productivity for DL systems development
Authors:
Gaetan J. D. R. Hains,
Arvid Jakobsson,
Youry Khmelevsky
Abstract:
Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead of algorithm programming, they offer a high degree of productivity. But they can be vulnerable to attacks and the verification of their correctne…
▽ More
Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead of algorithm programming, they offer a high degree of productivity. But they can be vulnerable to attacks and the verification of their correctness is only just emerging as a scientific and engineering possibility. This paper is a major update of a previously-published survey, attempting to cover all recent publications in this area. It also covers an even more recent trend, namely the design of domain-specific languages for producing and training neural nets.
△ Less
Submitted 31 January, 2019;
originally announced January 2019.