-
Chain-structured neural architecture search for financial time series forecasting
Authors:
Denis Levchenko,
Efstratios Rappos,
Shabnam Ataee,
Biagio Nigro,
Stephan Robert
Abstract:
We compare three popular neural architecture search strategies on chain-structured search spaces: Bayesian optimization, the hyperband method, and reinforcement learning in the context of financial time series forecasting.
We compare three popular neural architecture search strategies on chain-structured search spaces: Bayesian optimization, the hyperband method, and reinforcement learning in the context of financial time series forecasting.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
The 1st Data Science for Pavements Challenge
Authors:
Ashkan Behzadian,
Tanner Wambui Muturi,
Tianjie Zhang,
Hongmin Kim,
Amanda Mullins,
Yang Lu,
Neema Jasika Owor,
Yaw Adu-Gyamfi,
William Buttlar,
Majidifard Hamed,
Armstrong Aboah,
David Mensching,
Spragg Robert,
Matthew Corrigan,
Jack Youtchef,
Dave Eshan
Abstract:
The Data Science for Pavement Challenge (DSPC) seeks to accelerate the research and development of automated vision systems for pavement condition monitoring and evaluation by providing a platform with benchmarked datasets and codes for teams to innovate and develop machine learning algorithms that are practice-ready for use by industry. The first edition of the competition attracted 22 teams from…
▽ More
The Data Science for Pavement Challenge (DSPC) seeks to accelerate the research and development of automated vision systems for pavement condition monitoring and evaluation by providing a platform with benchmarked datasets and codes for teams to innovate and develop machine learning algorithms that are practice-ready for use by industry. The first edition of the competition attracted 22 teams from 8 countries. Participants were required to automatically detect and classify different types of pavement distresses present in images captured from multiple sources, and under different conditions. The competition was data-centric: teams were tasked to increase the accuracy of a predefined model architecture by utilizing various data modification methods such as cleaning, labeling and augmentation. A real-time, online evaluation system was developed to rank teams based on the F1 score. Leaderboard results showed the promise and challenges of machine for advancing automation in pavement monitoring and evaluation. This paper summarizes the solutions from the top 5 teams. These teams proposed innovations in the areas of data cleaning, annotation, augmentation, and detection parameter tuning. The F1 score for the top-ranked team was approximately 0.9. The paper concludes with a review of different experiments that worked well for the current challenge and those that did not yield any significant improvement in model accuracy.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions
Authors:
Francois St-Hilaire,
Dung Do Vu,
Antoine Frau,
Nathan Burns,
Farid Faraji,
Joseph Potochny,
Stephane Robert,
Arnaud Roussel,
Selene Zheng,
Taylor Glazier,
Junfel Vincent Romano,
Robert Belfer,
Muhammad Shayan,
Ariella Smofsky,
Tommy Delarosbil,
Seulmin Ahn,
Simon Eden-Walker,
Kritika Sony,
Ansona Onyi Ching,
Sabina Elkins,
Anush Stepanyan,
Adela Matajova,
Victor Chen,
Hossein Sahraei,
Robert Larson
, et al. (6 additional authors not shown)
Abstract:
Despite artificial intelligence (AI) having transformed major aspects of our society, less than a fraction of its potential has been explored, let alone deployed, for education. AI-powered learning can provide millions of learners with a highly personalized, active and practical learning experience, which is key to successful learning. This is especially relevant in the context of online learning…
▽ More
Despite artificial intelligence (AI) having transformed major aspects of our society, less than a fraction of its potential has been explored, let alone deployed, for education. AI-powered learning can provide millions of learners with a highly personalized, active and practical learning experience, which is key to successful learning. This is especially relevant in the context of online learning platforms. In this paper, we present the results of a comparative head-to-head study on learning outcomes for two popular online learning platforms (n=199 participants): A MOOC platform following a traditional model delivering content using lecture videos and multiple-choice quizzes, and the Korbit learning platform providing a highly personalized, active and practical learning experience. We observe a huge and statistically significant increase in the learning outcomes, with students on the Korbit platform providing full feedback resulting in higher course completion rates and achieving learning gains 2 to 2.5 times higher than both students on the MOOC platform and students in a control group who don't receive personalized feedback on the Korbit platform. The results demonstrate the tremendous impact that can be achieved with a personalized, active learning AI-powered system. Making this technology and learning experience available to millions of learners around the world will represent a significant leap forward towards the democratization of education.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Machine Learning for automatic identification of new minor species
Authors:
Frederic Schmidt,
Guillaume Cruz Mermy,
Justin Erwin,
Severine Robert,
Lori Neary,
Ian R. Thomas,
Frank Daerden,
Bojan Ristic,
Manish R. Patel,
Giancarlo Bellucci,
Jose-Juan Lopez-Moreno,
Ann-Carine Vandaele
Abstract:
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced $\sim$10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in…
▽ More
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced $\sim$10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in April 2018 until 15 January 2020. Other datasets are even larger with $\sim$billions of spectra for OMEGA onboard Mars Express or CRISM onboard Mars Reconnaissance Orbiter. Usually, new lines are discovered after a long iterative process of model fitting and manual residual analysis. Here we propose a new method based on unsupervised machine learning, to automatically detect new minor species. Although precise quantification is out of scope, this tool can also be used to quickly summarize the dataset, by giving few endmembers ("source") and their abundances. We approximate the dataset non-linearity by a linear mixture of abundance and source spectra (endmembers). We used unsupervised source separation in form of non-negative matrix factorization to estimate those quantities. Several methods are tested on synthetic and simulation data. Our approach is dedicated to detect minor species spectra rather than precisely quantifying them. On synthetic example, this approach is able to detect chemical compounds present in form of 100 hidden spectra out of $10^4$, at 1.5 times the noise level. Results on simulated spectra of NOMAD-SO targeting CH$_{4}$ show that detection limits goes in the range of 100-500 ppt in favorable conditions. Results on real martian data from NOMAD-SO show that CO$_{2}$ and H$_{2}$O are present, as expected, but CH$_{4}$ is absent. Nevertheless, we confirm a set of new unexpected lines in the database, attributed by ACS instrument Team to the CO$_{2}$ magnetic dipole.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
A Novel Framework for Spatio-Temporal Prediction of Environmental Data Using Deep Learning
Authors:
Federico Amato,
Fabian Guignard,
Sylvain Robert,
Mikhail Kanevski
Abstract:
As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle the climate crisis. Indeed, being universal nonlinear function approximation tools, Machine Learning algorithms are efficient in analysing and modelling spatially…
▽ More
As the role played by statistical and computational sciences in climate and environmental modelling and prediction becomes more important, Machine Learning researchers are becoming more aware of the relevance of their work to help tackle the climate crisis. Indeed, being universal nonlinear function approximation tools, Machine Learning algorithms are efficient in analysing and modelling spatially and temporally variable environmental data. While Deep Learning models have proved to be able to capture spatial, temporal, and spatio-temporal dependencies through their automatic feature representation learning, the problem of the interpolation of continuous spatio-temporal fields measured on a set of irregular points in space is still under-investigated. To fill this gap, we introduce here a framework for spatio-temporal prediction of climate and environmental data using deep learning. Specifically, we show how spatio-temporal processes can be decomposed in terms of a sum of products of temporally referenced basis functions, and of stochastic spatial coefficients which can be spatially modelled and mapped on a regular grid, allowing the reconstruction of the complete spatio-temporal signal. Applications on two case studies based on simulated and real-world data will show the effectiveness of the proposed framework in modelling coherent spatio-temporal fields.
△ Less
Submitted 22 December, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
A Force-Directed Approach for Offline GPS Trajectory Map Matching
Authors:
Efstratios Rappos,
Stephan Robert,
Philippe Cudré-Mauroux
Abstract:
We present a novel algorithm to match GPS trajectories onto maps offline (in batch mode) using techniques borrowed from the field of force-directed graph drawing. We consider a simulated physical system where each GPS trajectory is attracted or repelled by the underlying road network via electrical-like forces. We let the system evolve under the action of these physical forces such that individual…
▽ More
We present a novel algorithm to match GPS trajectories onto maps offline (in batch mode) using techniques borrowed from the field of force-directed graph drawing. We consider a simulated physical system where each GPS trajectory is attracted or repelled by the underlying road network via electrical-like forces. We let the system evolve under the action of these physical forces such that individual trajectories are attracted towards candidate roads to obtain a map matching path. Our approach has several advantages compared to traditional, routing-based, algorithms for map matching, including the ability to account for noise and to avoid large detours due to outliers in the data whilst taking into account the underlying topological restrictions (such as one-way roads). Our empirical evaluation using real GPS traces shows that our method produces better map matching results compared to alternative offline map matching algorithms on average, especially for routes in dense, urban areas.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.
-
Using GPU Simulation to Accurately Fit to the Power-Law Distribution
Authors:
Efstratios Rappos,
Stephan Robert
Abstract:
This article describes a methodology for fitting experimental data to the discrete power-law distribution and provides the results of a detailed simulation exercise used to calculate accurate cutoff values used to assess the fit to a power-law distribution when using the maximum likelihood estimation for the exponent of the distribution. Using massively parallel programming computing, we were able…
▽ More
This article describes a methodology for fitting experimental data to the discrete power-law distribution and provides the results of a detailed simulation exercise used to calculate accurate cutoff values used to assess the fit to a power-law distribution when using the maximum likelihood estimation for the exponent of the distribution. Using massively parallel programming computing, we were able to accelerate by a factor of 60 the computational time required for these calculations across a range of parameters and construct a series of detailed tables containing the test values to be used in a Kolmogorov-Smirnov goodness-of-fit test, allowing for an accurate assessment of the power-law fit from empirical data.
△ Less
Submitted 29 May, 2013;
originally announced May 2013.