Search | arXiv e-print repository

Data-Efficient Sleep Staging with Synthetic Time Series Pretraining

Authors: Niklas Grieger, Siamak Mehrkanoon, Stephan Bialonski

Abstract: Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we pr… ▽ More Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 10 pages, 4 figures

arXiv:2401.09881 [pdf, other]

GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting

Authors: Eloy Reulen, Siamak Mehrkanoon

Abstract: In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a u… ▽ More In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a unique SmaAt-GNet generator, an extension of the successful SmaAt-UNet architecture, capable of integrating precipitation masks (binarized precipitation maps) to enhance predictive accuracy. Additionally, GA-SmaAt-GNet incorporates an attention-augmented discriminator inspired by the Pix2Pix architecture. This innovative framework paves the way for generative precipitation nowcasting using multiple data sources. We evaluate the performance of SmaAt-GNet and GA-SmaAt-GNet using real-life precipitation data from the Netherlands, revealing notable improvements in overall performance and for extreme precipitation events compared to other models. Specifically, our proposed architecture demonstrates its main performance gain in summer and autumn, when precipitation intensity is typically at its peak. Furthermore, we conduct uncertainty analysis on the GA-SmaAt-GNet model and the precipitation dataset, providing insights into its predictive capabilities. Finally, we employ Grad-CAM to offer visual explanations of our model's predictions, generating activation heatmaps that highlight areas of input activation throughout the network. △ Less

Submitted 29 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 18 pages, 11 figurs

ACM Class: I.2; I.5

arXiv:2401.07958 [pdf, other]

GD-CAF: Graph Dual-stream Convolutional Attention Fusion for Precipitation Nowcasting

Authors: Lorand Vatamany, Siamak Mehrkanoon

Abstract: Accurate precipitation nowcasting is essential for various applications, including flood prediction, disaster management, optimizing agricultural activities, managing transportation routes and renewable energy. While several studies have addressed this challenging task from a sequence-to-sequence perspective, most of them have focused on a single area without considering the existing correlation b… ▽ More Accurate precipitation nowcasting is essential for various applications, including flood prediction, disaster management, optimizing agricultural activities, managing transportation routes and renewable energy. While several studies have addressed this challenging task from a sequence-to-sequence perspective, most of them have focused on a single area without considering the existing correlation between multiple disjoint regions. In this paper, we formulate precipitation nowcasting as a spatiotemporal graph sequence nowcasting problem. In particular, we introduce Graph Dual-stream Convolutional Attention Fusion (GD-CAF), a novel approach designed to learn from historical spatiotemporal graph of precipitation maps and nowcast future time step ahead precipitation at different spatial locations. GD-CAF consists of spatio-temporal convolutional attention as well as gated fusion modules which are equipped with depthwise-separable convolutional operations. This enhancement enables the model to directly process the high-dimensional spatiotemporal graph of precipitation maps and exploits higher-order correlations between the data dimensions. We evaluate our model on seven years of precipitation maps across Europe and its neighboring areas collected from the ERA5 dataset, provided by Copernicus Climate Change Services. The experimental results reveal the superior performance of the GD-CAF model compared to the other examined models. Additionally, visualizations of averaged seasonal spatial and temporal attention scores across the test set offer valuable insights into the most robust connections between diverse regions or time steps. △ Less

Submitted 26 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 19 pages, 13 figures

ACM Class: I.2; I.5

arXiv:2312.09623 [pdf, other]

A novel dual-stream time-frequency contrastive pretext tasks framework for sleep stage classification

Authors: Sergio Kazatzidis, Siamak Mehrkanoon

Abstract: Self-supervised learning addresses the challenge encountered by many supervised methods, i.e. the requirement of large amounts of annotated data. This challenge is particularly pronounced in fields such as the electroencephalography (EEG) research domain. Self-supervised learning operates instead by utilizing pseudo-labels, which are generated by pretext tasks, to obtain a rich and meaningful data… ▽ More Self-supervised learning addresses the challenge encountered by many supervised methods, i.e. the requirement of large amounts of annotated data. This challenge is particularly pronounced in fields such as the electroencephalography (EEG) research domain. Self-supervised learning operates instead by utilizing pseudo-labels, which are generated by pretext tasks, to obtain a rich and meaningful data representation. In this study, we aim at introducing a dual-stream pretext task architecture that operates both in the time and frequency domains. In particular, we have examined the incorporation of the novel Frequency Similarity (FS) pretext task into two existing pretext tasks, Relative Positioning (RP) and Temporal Shuffling (TS). We assess the accuracy of these models using the Physionet Challenge 2018 (PC18) dataset in the context of the downstream task sleep stage classification. The inclusion of FS resulted in a notable improvement in downstream task accuracy, with a 1.28 percent improvement on RP and a 2.02 percent improvement on TS. Furthermore, when visualizing the learned embeddings using Uniform Manifold Approximation and Projection (UMAP), distinct clusters emerge, indicating that the learned representations carry meaningful information. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 8 pages, 7 figures

arXiv:2311.18749 [pdf, other]

TransCORALNet: A Two-Stream Transformer CORAL Networks for Supply Chain Credit Assessment Cold Start

Authors: Jie Shi, Arno P. J. M. Siebes, Siamak Mehrkanoon

Abstract: This paper proposes an interpretable two-stream transformer CORAL networks (TransCORALNet) for supply chain credit assessment under the segment industry and cold start problem. The model aims to provide accurate credit assessment prediction for new supply chain borrowers with limited historical data. Here, the two-stream domain adaptation architecture with correlation alignment (CORAL) loss is use… ▽ More This paper proposes an interpretable two-stream transformer CORAL networks (TransCORALNet) for supply chain credit assessment under the segment industry and cold start problem. The model aims to provide accurate credit assessment prediction for new supply chain borrowers with limited historical data. Here, the two-stream domain adaptation architecture with correlation alignment (CORAL) loss is used as a core model and is equipped with transformer, which provides insights about the learned features and allow efficient parallelization during training. Thanks to the domain adaptation capability of the proposed model, the domain shift between the source and target domain is minimized. Therefore, the model exhibits good generalization where the source and target do not follow the same distribution, and a limited amount of target labeled instances exist. Furthermore, we employ Local Interpretable Model-agnostic Explanations (LIME) to provide more insight into the model prediction and identify the key features contributing to supply chain credit assessment decisions. The proposed model addresses four significant supply chain credit assessment challenges: domain shift, cold start, imbalanced-class and interpretability. Experimental results on a real-world data set demonstrate the superiority of TransCORALNet over a number of state-of-the-art baselines in terms of accuracy. The code is available on GitHub https://github.com/JieJieNiu/TransCORALN . △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 13 pages, 7 figures

ACM Class: I.2; I.5

arXiv:2303.06663 [pdf, other]

SAR-UNet: Small Attention Residual UNet for Explainable Nowcasting Tasks

Authors: Mathieu Renault, Siamak Mehrkanoon

Abstract: The accuracy and explainability of data-driven nowcasting models are of great importance in many socio-economic sectors reliant on weather-dependent decision making. This paper proposes a novel architecture called Small Attention Residual UNet (SAR-UNet) for precipitation and cloud cover nowcasting. Here, SmaAt-UNet is used as a core model and is further equipped with residual connections, paralle… ▽ More The accuracy and explainability of data-driven nowcasting models are of great importance in many socio-economic sectors reliant on weather-dependent decision making. This paper proposes a novel architecture called Small Attention Residual UNet (SAR-UNet) for precipitation and cloud cover nowcasting. Here, SmaAt-UNet is used as a core model and is further equipped with residual connections, parallel to the depthwise separable convolutions. The proposed SAR-UNet model is evaluated on two datasets, i.e., Dutch precipitation maps ranging from 2016 to 2019 and French cloud cover binary images from 2017 to 2018. The obtained results show that SAR-UNet outperforms other examined models in precipitation nowcasting from 30 to 180 minutes in the future as well as cloud cover nowcasting in the next 90 minutes. Furthermore, we provide additional insights on the nowcasts made by our proposed model using Grad-CAM, a visual explanation technique, which is employed on different levels of the encoder and decoder paths of the SAR-UNet model and produces heatmaps highlighting the critical regions in the input image as well as intermediate representations to the precipitation. The heatmaps generated by Grad-CAM reveal the interactions between the residual connections and the depthwise separable convolutions inside of the multiple depthwise separable blocks placed throughout the network architecture. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: 9 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2302.04102 [pdf, other]

WF-UNet: Weather Fusion UNet for Precipitation Nowcasting

Authors: Christos Kaparakis, Siamak Mehrkanoon

Abstract: Designing early warning systems for harsh weather and its effects, such as urban flooding or landslides, requires accurate short-term forecasts (nowcasts) of precipitation. Nowcasting is a significant task with several environmental applications, such as agricultural management or increasing flight safety. In this study, we investigate the use of a UNet core-model and its extension for precipitati… ▽ More Designing early warning systems for harsh weather and its effects, such as urban flooding or landslides, requires accurate short-term forecasts (nowcasts) of precipitation. Nowcasting is a significant task with several environmental applications, such as agricultural management or increasing flight safety. In this study, we investigate the use of a UNet core-model and its extension for precipitation nowcasting in western Europe for up to 3 hours ahead. In particular, we propose the Weather Fusion UNet (WF-UNet) model, which utilizes the Core 3D-UNet model and integrates precipitation and wind speed variables as input in the learning process and analyze its influences on the precipitation target task. We have collected six years of precipitation and wind radar images from Jan 2016 to Dec 2021 of 14 European countries, with 1-hour temporal resolution and 31 square km spatial resolution based on the ERA5 dataset, provided by Copernicus, the European Union's Earth observation programme. We compare the proposed WF-UNet model to persistence model as well as other UNet based architectures that are trained only using precipitation radar input data. The obtained results show that WF-UNet outperforms the other examined best-performing architectures by 22%, 8% and 6% lower MSE at a horizon of 1, 2 and 3 hours respectively. △ Less

Submitted 9 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: 8 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2301.02554 [pdf, other]

MSCDA: Multi-level Semantic-guided Contrast Improves Unsupervised Domain Adaptation for Breast MRI Segmentation in Small Datasets

Authors: Sheng Kuang, Henry C. Woodruff, Renee Granzier, Thiemo J. A. van Nijnatten, Marc B. I. Lobbes, Marjolein L. Smidt, Philippe Lambin, Siamak Mehrkanoon

Abstract: Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. In this paper, we propose a novel Multi-level S… ▽ More Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. In this paper, we propose a novel Multi-level Semantic-guided Contrastive Domain Adaptation (MSCDA) framework to address this issue in an unsupervised manner. Our approach incorporates self-training with contrastive learning to align feature representations between domains. In particular, we extend the contrastive loss by incorporating pixel-to-pixel, pixel-to-centroid, and centroid-to-centroid contrasts to better exploit the underlying semantic information of the image at different levels. To resolve the data imbalance problem, we utilize a category-wise cross-domain sampling strategy to sample anchors from target images and build a hybrid memory bank to store samples from source images. We have validated MSCDA with a challenging task of cross-domain breast MRI segmentation between datasets of healthy volunteers and invasive breast cancer patients. Extensive experiments show that MSCDA effectively improves the model's feature alignment capabilities between domains, outperforming state-of-the-art methods. Furthermore, the framework is shown to be label-efficient, achieving good performance with a smaller source dataset. The code is publicly available at \url{https://github.com/ShengKuangCN/MSCDA}. △ Less

Submitted 8 June, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 17 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2207.03927 [pdf, other]

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization

Authors: Sheng Kuang, Kiki van der Heijden, Siamak Mehrkanoon

Abstract: Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predic… ▽ More Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 7

ACM Class: I.2; I.5

arXiv:2204.13744 [pdf, other]

GCN-FFNN: A Two-Stream Deep Model for Learning Solution to Partial Differential Equations

Authors: Onur Bilgin, Thomas Vergutz, Siamak Mehrkanoon

Abstract: This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and… ▽ More This paper introduces a novel two-stream deep model based on graph convolutional network (GCN) architecture and feed-forward neural networks (FFNN) for learning the solution of nonlinear partial differential equations (PDEs). The model aims at incorporating both graph and grid input representations using two streams corresponding to GCN and FFNN models, respectively. Each stream layer receives and processes its own input representation. As opposed to FFNN which receives a grid-like structure, the GCN stream layer operates on graph input data where the neighborhood information is incorporated through the adjacency matrix of the graph. In this way, the proposed GCN-FFNN model learns from two types of input representations, i.e. grid and graph data, obtained via the discretization of the PDE domain. The GCN-FFNN model is trained in two phases. In the first phase, the model parameters of each stream are trained separately. Both streams employ the same error function to adjust their parameters by enforcing the models to satisfy the given PDE as well as its initial and boundary conditions on grid or graph collocation (training) data. In the second phase, the learned parameters of two-stream layers are frozen and their learned representation solutions are fed to fully connected layers whose parameters are learned using the previously used error function. The learned GCN-FFNN model is tested on test data located both inside and outside the PDE domain. The obtained numerical results demonstrate the applicability and efficiency of the proposed GCN-FFNN model over individual GCN and FFNN models on 1D-Burgers, 1D-Schrödinger, 2D-Burgers and 2D-Schrödinger equations. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 10 pages, 10 figures

ACM Class: I.2; G.1

arXiv:2202.04996 [pdf, other]

AA-TransUNet: Attention Augmented TransUNet For Nowcasting Tasks

Authors: Yimin Yang, Siamak Mehrkanoon

Abstract: Data driven modeling based approaches have recently gained a lot of attention in many challenging meteorological applications including weather element forecasting. This paper introduces a novel data-driven predictive model based on TransUNet for precipitation nowcasting task. The TransUNet model which combines the Transformer and U-Net models has been previously successfully applied in medical se… ▽ More Data driven modeling based approaches have recently gained a lot of attention in many challenging meteorological applications including weather element forecasting. This paper introduces a novel data-driven predictive model based on TransUNet for precipitation nowcasting task. The TransUNet model which combines the Transformer and U-Net models has been previously successfully applied in medical segmentation tasks. Here, TransUNet is used as a core model and is further equipped with Convolutional Block Attention Modules (CBAM) and Depthwise-separable Convolution (DSC). The proposed Attention Augmented TransUNet (AA-TransUNet) model is evaluated on two distinct datasets: the Dutch precipitation map dataset and the French cloud cover dataset. The obtained results show that the proposed model outperforms other examined models on both tested datasets. Furthermore, the uncertainty analysis of the proposed AA-TransUNet is provided to give additional insights on its predictions. △ Less

Submitted 15 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: 8 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2108.07063 [pdf, other]

Multistream Graph Attention Networks for Wind Speed Forecasting

Authors: Dogan Aykas, Siamak Mehrkanoon

Abstract: Reliable and accurate wind speed prediction has significant impact in many industrial sectors such as economic, business and management among others. This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT). In particular, the proposed model extends GAT architecture by equip** it with a learnable adjacency matrix as well as incorporating a new attention m… ▽ More Reliable and accurate wind speed prediction has significant impact in many industrial sectors such as economic, business and management among others. This paper presents a new model for wind speed prediction based on Graph Attention Networks (GAT). In particular, the proposed model extends GAT architecture by equip** it with a learnable adjacency matrix as well as incorporating a new attention mechanism with the aim of obtaining attention scores per weather variable. The output of the GAT based model is combined with the LSTM layer in order to exploit both the spatial and temporal characteristics of the multivariate multidimensional historical weather data. Real weather data collected from several cities in Denmark and Netherlands are used to conduct the experiments and evaluate the performance of the proposed model. We show that in comparison to previous architectures used for wind speed prediction, the proposed model is able to better learn the complex input-output relationships of the weather data. Furthermore, thanks to the learned attention weights, the model provides an additional insights on the most important weather variables and cities for the studied prediction task. △ Less

Submitted 26 October, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

Comments: 8 pages, 5 figures

ACM Class: I.2; I.5

arXiv:2106.14742 [pdf, other]

TENT: Tensorized Encoder Transformer for Temperature Forecasting

Authors: Onur Bilgin, Paweł Mąka, Thomas Vergutz, Siamak Mehrkanoon

Abstract: Reliable weather forecasting is of great importance in science, business, and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate attention mechanisms. In this work, we introduce a novel model based on Transformer architecture for weather forecasting. The proposed Tensorial Encoder Transfor… ▽ More Reliable weather forecasting is of great importance in science, business, and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate attention mechanisms. In this work, we introduce a novel model based on Transformer architecture for weather forecasting. The proposed Tensorial Encoder Transformer (TENT) model is equipped with tensorial attention and thus it exploits the spatiotemporal structure of weather data by processing it in multidimensional tensorial format. We show that compared to the classical encoder transformer, 3D convolutional neural networks, LSTM, and Convolutional LSTM, the proposed TENT model can better learn the underlying complex pattern of the weather data for the studied temperature prediction task. Experiments on two real-life weather datasets are performed. The datasets consist of historical measurements from weather stations in the USA, Canada and Europe. The first dataset contains hourly measurements of weather attributes for 30 cities in the USA and Canada from October 2012 to November 2017. The second dataset contains daily measurements of weather attributes of 18 cities across Europe from May 2005 to April 2020. Two attention scores are introduced based on the obtained tonsorial attention and are visualized in order to shed light on the decision-making process of our model and provide insight knowledge on the most important cities for the target cities. △ Less

Submitted 21 February, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 10 pages, 10 figures

ACM Class: I.2; I.5

arXiv:2102.10570 [pdf, other]

Symbolic regression for scientific discovery: an application to wind speed forecasting

Authors: Ismail Alaoui Abdellaoui, Siamak Mehrkanoon

Abstract: Symbolic regression corresponds to an ensemble of techniques that allow to uncover an analytical equation from data. Through a closed form formula, these techniques provide great advantages such as potential scientific discovery of new laws, as well as explainability, feature engineering as well as fast inference. Similarly, deep learning based techniques has shown an extraordinary ability of mode… ▽ More Symbolic regression corresponds to an ensemble of techniques that allow to uncover an analytical equation from data. Through a closed form formula, these techniques provide great advantages such as potential scientific discovery of new laws, as well as explainability, feature engineering as well as fast inference. Similarly, deep learning based techniques has shown an extraordinary ability of modeling complex patterns. The present paper aims at applying a recent end-to-end symbolic regression technique, i.e. the equation learner (EQL), to get an analytical equation for wind speed forecasting. We show that it is possible to derive an analytical equation that can achieve reasonable accuracy for short term horizons predictions only using few number of features. △ Less

Submitted 26 October, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

Comments: 8 pages, 8 figs

ACM Class: I.2; I.5

arXiv:2102.06442 [pdf, other]

doi 10.1016/j.neunet.2021.08.036

Broad-UNet: Multi-scale feature learning for nowcasting tasks

Authors: Jesus Garcia Fernandez, Siamak Mehrkanoon

Abstract: Weather nowcasting consists of predicting meteorological components in the short term at high spatial resolutions. Due to its influence in many human activities, accurate nowcasting has recently gained plenty of attention. In this paper, we treat the nowcasting problem as an image-to-image translation problem using satellite imagery. We introduce Broad-UNet, a novel architecture based on the core… ▽ More Weather nowcasting consists of predicting meteorological components in the short term at high spatial resolutions. Due to its influence in many human activities, accurate nowcasting has recently gained plenty of attention. In this paper, we treat the nowcasting problem as an image-to-image translation problem using satellite imagery. We introduce Broad-UNet, a novel architecture based on the core UNet model, to efficiently address this problem. In particular, the proposed Broad-UNet is equipped with asymmetric parallel convolutions as well as Atrous Spatial Pyramid Pooling (ASPP) module. In this way, The the Broad-UNet model learns more complex patterns by combining multi-scale features while using fewer parameters than the core UNet model. The proposed model is applied on two different nowcasting tasks, i.e. precipitation maps and cloud cover nowcasting. The obtained numerical results show that the introduced Broad-UNet model performs more accurate predictions compared to the other examined architectures. △ Less

Submitted 26 October, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 9 pages, 11 figures

ACM Class: I.2; I.5

arXiv:2101.10041 [pdf, other]

Deep Graph Convolutional Networks for Wind Speed Prediction

Authors: Tomasz Stańczyk, Siamak Mehrkanoon

Abstract: Wind speed prediction and forecasting is important for various business and management sectors. In this paper, we introduce new models for wind speed prediction based on graph convolutional networks (GCNs). Given hourly data of several weather variables acquired from multiple weather stations, wind speed values are predicted for multiple time steps ahead. In particular, the weather stations are tr… ▽ More Wind speed prediction and forecasting is important for various business and management sectors. In this paper, we introduce new models for wind speed prediction based on graph convolutional networks (GCNs). Given hourly data of several weather variables acquired from multiple weather stations, wind speed values are predicted for multiple time steps ahead. In particular, the weather stations are treated as nodes of a graph whose associated adjacency matrix is learnable. In this way, the network learns the graph spatial structure and determines the strength of relations between the weather stations based on the historical weather data. We add a self-loop connection to the learnt adjacency matrix and normalize the adjacency matrix. We examine two scenarios with the self-loop connection setting (two separate models). In the first scenario, the self-loop connection is imposed as a constant additive. In the second scenario a learnable parameter is included to enable the network to decide about the self-loop connection strength. Furthermore, we incorporate data from multiple time steps with temporal convolution, which together with spatial graph convolution constitutes spatio-temporal graph convolution. We perform experiments on real datasets collected from weather stations located in cities in Denmark and the Netherlands. The numerical experiments show that our proposed models outperform previously developed baseline models on the referenced datasets. We provide additional insights by visualizing learnt adjacency matrices from each layer of our models. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: 10 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2011.03303 [pdf, other]

Deep coastal sea elements forecasting using U-Net based models

Authors: Jesús García Fernández, Ismail Alaoui Abdellaoui, Siamak Mehrkanoon

Abstract: The supply and demand of energy is influenced by meteorological conditions. The relevance of accurate weather forecasts increases as the demand for renewable energy sources increases. The energy providers and policy makers require weather information to make informed choices and establish optimal plans according to the operational objectives. Due to the recent development of deep learning techniqu… ▽ More The supply and demand of energy is influenced by meteorological conditions. The relevance of accurate weather forecasts increases as the demand for renewable energy sources increases. The energy providers and policy makers require weather information to make informed choices and establish optimal plans according to the operational objectives. Due to the recent development of deep learning techniques applied to satellite imagery, weather forecasting that uses remote sensing data has also been the subject of major progress. The present paper investigates multiple steps ahead frame prediction for coastal sea elements in the Netherlands using U-Net based architectures. Hourly data from the Copernicus observation programme spanned over a period of 2 years has been used to train the models and make the forecasting, including seasonal predictions. We propose a variation of the U-Net architecture and further extend this novel model using residual connections, parallel convolutions and asymmetric convolutions in order to introduce three additional architectures. In particular, we show that the architecture equipped with parallel and asymmetric convolutions as well as skip connections outperforms the other three discussed models. △ Less

Submitted 8 November, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: 12 pages, 11 figures

ACM Class: I.2; I.5

arXiv:2009.11239 [pdf, other]

Deep multi-stations weather forecasting: explainable recurrent convolutional neural networks

Authors: Ismail Alaoui Abdellaoui, Siamak Mehrkanoon

Abstract: Deep learning applied to weather forecasting has started gaining popularity because of the progress achieved by data-driven models. The present paper compares two different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years. We propose the Deep Attention Unistream Multistream (DAUM) networks that inves… ▽ More Deep learning applied to weather forecasting has started gaining popularity because of the progress achieved by data-driven models. The present paper compares two different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years. We propose the Deep Attention Unistream Multistream (DAUM) networks that investigate different types of input representations (i.e. tensorial unistream vs. multistream ) as well as the incorporation of the attention mechanism. In particular, we show that adding a self-attention block within the models increases the overall forecasting performance. Furthermore, visualization techniques such as occlusion analysis and score maximization are used to give an additional insight on the most important features and cities for predicting a particular target feature of target cities. △ Less

Submitted 9 February, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: 8 pages, 8 figures

ACM Class: I.2; I.5

arXiv:2007.12567 [pdf, other]

Wind speed prediction using multidimensional convolutional neural networks

Authors: Kevin Trebing, Siamak Mehrkanoon

Abstract: Accurate wind speed forecasting is of great importance for many economic, business and management sectors. This paper introduces a new model based on convolutional neural networks (CNNs) for wind speed prediction tasks. In particular, we show that compared to classical CNN-based models, the proposed model is able to better characterise the spatio-temporal evolution of the wind data by learning the… ▽ More Accurate wind speed forecasting is of great importance for many economic, business and management sectors. This paper introduces a new model based on convolutional neural networks (CNNs) for wind speed prediction tasks. In particular, we show that compared to classical CNN-based models, the proposed model is able to better characterise the spatio-temporal evolution of the wind data by learning the underlying complex input-output relationships from multiple dimensions (views) of the input data. The proposed model exploits the spatio-temporal multivariate multidimensional historical weather data for learning new representations used for wind forecasting. We conduct experiments on two real-life weather datasets. The datasets are measurements from cities in Denmark and in the Netherlands. The proposed model is compared with traditional 2- and 3-dimensional CNN models, a 2D-CNN model with an attention layer and a 2D-CNN model equipped with upscaling and depthwise separable convolutions. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: 8 pages, 6 figures

ACM Class: I.2; I.5

arXiv:2007.06655 [pdf, other]

Deep Neural-Kernel Machines

Authors: Siamak Mehrkanoon

Abstract: In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture, an approach that seek the synergy between two powerful class of models, i.e. kernel-based models and artificial neural networks. The introduced deep neural-kernel framework is composed of a hybridization of the neural networks architecture and a kernel machine. More precisely, for th… ▽ More In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture, an approach that seek the synergy between two powerful class of models, i.e. kernel-based models and artificial neural networks. The introduced deep neural-kernel framework is composed of a hybridization of the neural networks architecture and a kernel machine. More precisely, for the kernel counterpart the model is based on Least Squares Support Vector Machines with explicit feature map**. Here we discuss the use of one form of an explicit feature map obtained by random Fourier features. Thanks to this explicit feature map, in one hand bridging the two architectures has become more straightforward and on the other hand one can find the solution of the associated optimization problem in the primal, therefore making the model scalable to large scale datasets. We begin by introducing a neural-kernel architecture that serves as the core module for deeper models equipped with different pooling layers. In particular, we review three neural-kernel machines with average, maxout and convolutional pooling layers. In average pooling layer the outputs of the previous representation layers are averaged. The maxout layer triggers competition among different input representations and allows the formation of multiple sub-networks within the same model. The convolutional pooling layer reduces the dimensionality of the multi-scale output representations. Comparison with neural-kernel model, kernel based models and the classical neural networks architecture have been made and the numerical experiments illustrate the effectiveness of the introduced models on several benchmark datasets. △ Less

Submitted 19 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: 16 pages, 17 figures

ACM Class: I.2; I.5

arXiv:2007.04417 [pdf, other]

SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet Architecture

Authors: Kevin Trebing, Tomasz Stanczyk, Siamak Mehrkanoon

Abstract: Weather forecasting is dominated by numerical weather prediction that tries to model accurately the physical properties of the atmosphere. A downside of numerical weather prediction is that it is lacking the ability for short-term forecasts using the latest available information. By using a data-driven neural network approach we show that it is possible to produce an accurate precipitation nowcast… ▽ More Weather forecasting is dominated by numerical weather prediction that tries to model accurately the physical properties of the atmosphere. A downside of numerical weather prediction is that it is lacking the ability for short-term forecasts using the latest available information. By using a data-driven neural network approach we show that it is possible to produce an accurate precipitation nowcast. To this end, we propose SmaAt-UNet, an efficient convolutional neural networks-based on the well known UNet architecture equipped with attention modules and depthwise-separable convolutions. We evaluate our approaches on a real-life datasets using precipitation maps from the region of the Netherlands and binary images of cloud coverage of France. The experimental results show that in terms of prediction performance, the proposed model is comparable to other examined models while only using a quarter of the trainable parameters. △ Less

Submitted 24 January, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: 9 pages, 4 figures

ACM Class: I.2; I.5

arXiv:2007.00897 [pdf, other]

Deep brain state classification of MEG data

Authors: Ismail Alaoui Abdellaoui, Jesus Garcia Fernandez, Caner Sahinli, Siamak Mehrkanoon

Abstract: Neuroimaging techniques have shown to be useful when studying the brain's activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), in combination with various deep artificial neural network models to perform brain decoding. More specifically, here we investigate to which extent can we infer the task performed by a subject based on its MEG data. T… ▽ More Neuroimaging techniques have shown to be useful when studying the brain's activity. This paper uses Magnetoencephalography (MEG) data, provided by the Human Connectome Project (HCP), in combination with various deep artificial neural network models to perform brain decoding. More specifically, here we investigate to which extent can we infer the task performed by a subject based on its MEG data. Three models based on compact convolution, combined convolutional and long short-term architecture as well as a model based on multi-view learning that aims at fusing the outputs of the two stream networks are proposed and examined. These models exploit the spatio-temporal MEG data for learning new representations that are used to decode the relevant tasks across subjects. In order to realize the most relevant features of the input signals, two attention mechanisms, i.e. self and global attention, are incorporated in all the models. The experimental results of cross subject multi-class classification on the studied MEG dataset show that the inclusion of attention improves the generalization of the models across subjects. △ Less

Submitted 4 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: 11 pages, 11 figures

ACM Class: I.2; I.5

arXiv:1503.02216 [pdf, other]

Higher order Matching Pursuit for Low Rank Tensor Learning

Authors: Yuning Yang, Siamak Mehrkanoon, Johan A. K. Suykens

Abstract: Low rank tensor learning, such as tensor completion and multilinear multitask learning, has received much attention in recent years. In this paper, we propose higher order matching pursuit for low rank tensor learning problems with a convex or a nonconvex cost function, which is a generalization of the matching pursuit type methods. At each iteration, the main cost of the proposed methods is only… ▽ More Low rank tensor learning, such as tensor completion and multilinear multitask learning, has received much attention in recent years. In this paper, we propose higher order matching pursuit for low rank tensor learning problems with a convex or a nonconvex cost function, which is a generalization of the matching pursuit type methods. At each iteration, the main cost of the proposed methods is only to compute a rank-one tensor, which can be done efficiently, making the proposed methods scalable to large scale problems. Moreover, storing the resulting rank-one tensors is of low storage requirement, which can help to break the curse of dimensionality. The linear convergence rate of the proposed methods is established in various circumstances. Along with the main methods, we also provide a method of low computational complexity for approximately computing the rank-one tensors, with provable approximation ratio, which helps to improve the efficiency of the main methods and to analyze the convergence rate. Experimental results on synthetic as well as real datasets verify the efficiency and effectiveness of the proposed methods. △ Less

Submitted 7 March, 2015; originally announced March 2015.

Showing 1–23 of 23 results for author: Mehrkanoon, S