-
Mitigating Health Data Poverty: Generative Approaches versus Resampling for Time-series Clinical Data
Authors:
Raffaele Marchesi,
Nicolo Micheletti,
Giuseppe Jurman,
Venet Osmani
Abstract:
Several approaches have been developed to mitigate algorithmic bias stemming from health data poverty, where minority groups are underrepresented in training datasets. Augmenting the minority class using resampling (such as SMOTE) is a widely used approach due to the simplicity of the algorithms. However, these algorithms decrease data variability and may introduce correlations between samples, gi…
▽ More
Several approaches have been developed to mitigate algorithmic bias stemming from health data poverty, where minority groups are underrepresented in training datasets. Augmenting the minority class using resampling (such as SMOTE) is a widely used approach due to the simplicity of the algorithms. However, these algorithms decrease data variability and may introduce correlations between samples, giving rise to the use of generative approaches based on GAN. Generation of high-dimensional, time-series, authentic data that provides a wide distribution coverage of the real data, remains a challenging task for both resampling and GAN-based approaches. In this work we propose CA-GAN architecture that addresses some of the shortcomings of the current approaches, where we provide a detailed comparison with both SMOTE and WGAN-GP*, using a high-dimensional, time-series, real dataset of 3343 hypotensive Caucasian and Black patients. We show that our approach is better at both generating authentic data of the minority class and remaining within the original distribution of the real data.
△ Less
Submitted 26 October, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
MASS-UMAP: Fast and accurate analog ensemble search in weather radar archive
Authors:
Gabriele Franch,
Giuseppe Jurman,
Luca Coviello,
Marta Pendesini,
Cesare Furlanello
Abstract:
The use of analogs - similar weather patterns - for weather forecasting and analysis is an established method in meteorology. The most challenging aspect of using this approach in the context of operational radar applications is to be able to perform a fast and accurate search for similar spatiotemporal precipitation patterns in a large archive of historical records. In this context, sequential pa…
▽ More
The use of analogs - similar weather patterns - for weather forecasting and analysis is an established method in meteorology. The most challenging aspect of using this approach in the context of operational radar applications is to be able to perform a fast and accurate search for similar spatiotemporal precipitation patterns in a large archive of historical records. In this context, sequential pairwise search is too slow and computationally expensive. Here we propose an architecture to significantly speed-up spatiotemporal analog retrieval by combining nonlinear geometric dimensionality reduction (UMAP) with the fastest known Euclidean search algorithm for time series (MASS) to find radar analogs in constant time, independently of the desired temporal length to match and the number of extracted analogs. We compare UMAP with Principal component analysis (PCA) and show that UMAP outperforms PCA for spatial MSE analog search with proper settings. Moreover, we show that MASS is 20 times faster than brute force search on the UMAP embeddings space. We test the architecture on a real dataset and show that it enables precise and fast operational analog ensemble search through more than 2 years of radar archive in less than 5 seconds on a single workstation.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
In-field grape berries counting for yield estimation using dilated CNNs
Authors:
L. Coviello,
M. Cristoforetti,
G. Jurman,
C. Furlanello
Abstract:
Digital technologies ignited a revolution in the agrifood domain known as precision agriculture: a main question for enabling precision agriculture at scale is if accurate product quality control can be made available at minimal cost, leveraging existing technologies and agronomists' skills. As a contribution along this direction we demonstrate a tool for accurate fruit yield estimation from smart…
▽ More
Digital technologies ignited a revolution in the agrifood domain known as precision agriculture: a main question for enabling precision agriculture at scale is if accurate product quality control can be made available at minimal cost, leveraging existing technologies and agronomists' skills. As a contribution along this direction we demonstrate a tool for accurate fruit yield estimation from smartphone cameras, by adapting Deep Learning algorithms originally developed for crowd counting.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
High Resolution Forecasting of Heat Waves impacts on Leaf Area Index by Multiscale Multitemporal Deep Learning
Authors:
Andrea Gobbi,
Marco Cristoforetti,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
Climate change impacts could cause progressive decrease of crop quality and yield, up to harvest failures. In particular, heat waves and other climate extremes can lead to localized food shortages and even threaten food security of communities worldwide. In this study, we apply a deep learning architecture for high resolution forecasting (300 m, 10 days) of the Leaf Area Index (LAI), whose dynamic…
▽ More
Climate change impacts could cause progressive decrease of crop quality and yield, up to harvest failures. In particular, heat waves and other climate extremes can lead to localized food shortages and even threaten food security of communities worldwide. In this study, we apply a deep learning architecture for high resolution forecasting (300 m, 10 days) of the Leaf Area Index (LAI), whose dynamics has been widely used to model the growth phase of crops and impact of heat waves. LAI models can be computed at 0.1 degree spatial resolution with an auto regressive component adjusted with weather conditions, validated with remote sensing measurements. However model actionability is poor in regions of varying terrain morphology at this scale (about 8 km at the Alps latitude). Our deep learning model aims instead at forecasting LAI by training multiscale multitemporal (MSMT) data from the Copernicus Global Land Service (CGLS) project for all Europe at 300m resolution and medium-resolution historical weather data. Further, the deep learning model inputs integrate high-resolution land surface features, known to improve forecasts of agricultural productivity. The historical weather data are then replaced with forecast values to predict LAI values at 10 day horizon on Europe. We propose the MSMT model to develop a high resolution crop-specific warning system for mitigating damage due to heat waves and other extreme events.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
A multiobjective deep learning approach for predictive classification in Neuroblastoma
Authors:
Valerio Maggio,
Marco Chierici,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
Neuroblastoma is a strongly heterogeneous cancer with very diverse clinical courses that may vary from spontaneous regression to fatal progression; an accurate patient's risk estimation at diagnosis is essential to design appropriate tumor treatment strategies. Neuroblastoma is a paradigm disease where different diagnostic and prognostic endpoints should be predicted from common molecular and clin…
▽ More
Neuroblastoma is a strongly heterogeneous cancer with very diverse clinical courses that may vary from spontaneous regression to fatal progression; an accurate patient's risk estimation at diagnosis is essential to design appropriate tumor treatment strategies. Neuroblastoma is a paradigm disease where different diagnostic and prognostic endpoints should be predicted from common molecular and clinical information, with increasing complexity, as shown in the FDA MAQC-II study. Here we introduce the novel multiobjective deep learning architecture CDRP (Concatenated Diagnostic Relapse Prognostic) composed by 8 layers to obtain a combined diagnostic and prognostic prediction from high-throughput transcriptomics data. Two distinct loss functions are optimized for the Event Free Survival (EFS) and Overall Survival (OS) prognosis, respectively. We use the High-Risk (HR) diagnostic information as an additional input generated by an autoencoder embedding. The latter is used as network regulariser, based on a clinical algorithm commonly adopted for stratifying patients from cancer stage, age at insurgence of disease, and MYCN, the specific molecular marker. The architecture was applied to Illumina HiSeq2000 RNA-Seq for 498 neuroblastoma patients (176 at high risk) from the Sequencing Quality Control (SEQC) study, obtaining state-of-art on the diagnostic endpoint and improving prediction of prognosis over the HR cohort.
△ Less
Submitted 22 February, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Deep Learning for Automatic Stereotypical Motor Movement Detection using Wearable Sensors in Autism Spectrum Disorders
Authors:
Nastaran Mohammadian Rad,
Seyed Mostafa Kia,
Calogero Zarbo,
Twan van Laarhoven,
Giuseppe Jurman,
Paola Venuti,
Elena Marchiori,
Cesare Furlanello
Abstract:
Autism Spectrum Disorders are associated with atypical movements, of which stereotypical motor movements (SMMs) interfere with learning and social interaction. The automatic SMM detection using inertial measurement units (IMU) remains complex due to the strong intra and inter-subject variability, especially when handcrafted features are extracted from the signal. We propose a new application of th…
▽ More
Autism Spectrum Disorders are associated with atypical movements, of which stereotypical motor movements (SMMs) interfere with learning and social interaction. The automatic SMM detection using inertial measurement units (IMU) remains complex due to the strong intra and inter-subject variability, especially when handcrafted features are extracted from the signal. We propose a new application of the deep learning to facilitate automatic SMM detection using multi-axis IMUs. We use a convolutional neural network (CNN) to learn a discriminative feature space from raw data. We show how the CNN can be used for parameter transfer learning to enhance the detection rate on longitudinal data. We also combine the long short-term memory (LSTM) with CNN to model the temporal patterns in a sequence of multi-axis signals. Further, we employ ensemble learning to combine multiple LSTM learners into a more robust SMM detector. Our results show that: 1) feature learning outperforms handcrafted features; 2) parameter transfer learning is beneficial in longitudinal settings; 3) using LSTM to learn the temporal dynamic of signals enhances the detection rate especially for skewed training data; 4) an ensemble of LSTMs provides more accurate and stable detectors. These findings provide a significant step toward accurate SMM detection in real-time scenarios.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.
-
Phylogenetic Convolutional Neural Networks in Metagenomics
Authors:
Diego Fioravanti,
Ylenia Giarratano,
Valerio Maggio,
Claudio Agostinelli,
Marco Chierici,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tre…
▽ More
Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Results: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Conclusion: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user. Keywords: Metagenomics; Deep learning; Convolutional Neural Networks; Phylogenetic trees
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Towards a scientific blockchain framework for reproducible data analysis
Authors:
C. Furlanello,
M. De Domenico,
G. Jurman,
N. Bussola
Abstract:
Publishing reproducible analyses is a long-standing and widespread challenge for the scientific community, funding bodies and publishers. Although a definitive solution is still elusive, the problem is recognized to affect all disciplines and lead to a critical system inefficiency. Here, we propose a blockchain-based approach to enhance scientific reproducibility, with a focus on life science stud…
▽ More
Publishing reproducible analyses is a long-standing and widespread challenge for the scientific community, funding bodies and publishers. Although a definitive solution is still elusive, the problem is recognized to affect all disciplines and lead to a critical system inefficiency. Here, we propose a blockchain-based approach to enhance scientific reproducibility, with a focus on life science studies and precision medicine. While the interest of encoding permanently into an immutable ledger all the study key information-including endpoints, data and metadata, protocols, analytical methods and all findings-has been already highlighted, here we apply the blockchain approach to solve the issue of rewarding time and expertise of scientists that commit to verify reproducibility. Our mechanism builds a trustless ecosystem of researchers, funding bodies and publishers cooperating to guarantee digital and permanent access to information and reproducible results. As a natural byproduct, a procedure to quantify scientists' and institutions' reputation for ranking purposes is obtained.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
Towards meaningful physics from generative models
Authors:
Marco Cristoforetti,
Giuseppe Jurman,
Andrea I. Nardelli,
Cesare Furlanello
Abstract:
In several physical systems, important properties characterizing the system itself are theoretically related with specific degrees of freedom. Although standard Monte Carlo simulations provide an effective tool to accurately reconstruct the physical configurations of the system, they are unable to isolate the different contributions corresponding to different degrees of freedom. Here we show that…
▽ More
In several physical systems, important properties characterizing the system itself are theoretically related with specific degrees of freedom. Although standard Monte Carlo simulations provide an effective tool to accurately reconstruct the physical configurations of the system, they are unable to isolate the different contributions corresponding to different degrees of freedom. Here we show that unsupervised deep learning can become a valid support to MC simulation, coupling useful insights in the phases detection task with good reconstruction performance. As a testbed we consider the 2D XY model, showing that a deep neural network based on variational autoencoders can detect the continuous Kosterlitz-Thouless (KT) transitions, and that, if endowed with the appropriate constrains, they generate configurations with meaningful physical content.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
Community dynamics in connected time-dependent multilayer networks
Authors:
Marco Cristoforetti,
Marco Guerini,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly mode…
▽ More
Different strategies have been considered to extract information from social media about how similarly people react to the same news or event. In this context, a powerful method is offered by the application of graph techniques to the contents produced by social network users. In particular, large events typically attract enough content traffic along time to enable an analysis that explicitly models a dependence from the time dimension. Here we demonstrate how it is possible to extend the application of community detection strategies in complex networks to the case of time-dependent multilayer networks, whenever the connection between consecutive time layers is non-trivial. We apply the method to 400K Twitter post related to the Expo event held in Milan (Italy) between May and October 2015.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Convolutional Neural Network for Stereotypical Motor Movement Detection in Autism
Authors:
Nastaran Mohammadian Rad,
Andrea Bizzego,
Seyed Mostafa Kia,
Giuseppe Jurman,
Paola Venuti,
Cesare Furlanello
Abstract:
Autism Spectrum Disorders (ASDs) are often associated with specific atypical postural or motor behaviors, of which Stereotypical Motor Movements (SMMs) have a specific visibility. While the identification and the quantification of SMM patterns remain complex, its automation would provide support to accurate tuning of the intervention in the therapy of autism. Therefore, it is essential to develop…
▽ More
Autism Spectrum Disorders (ASDs) are often associated with specific atypical postural or motor behaviors, of which Stereotypical Motor Movements (SMMs) have a specific visibility. While the identification and the quantification of SMM patterns remain complex, its automation would provide support to accurate tuning of the intervention in the therapy of autism. Therefore, it is essential to develop automatic SMM detection systems in a real world setting, taking care of strong inter-subject and intra-subject variability. Wireless accelerometer sensing technology can provide a valid infrastructure for real-time SMM detection, however such variability remains a problem also for machine learning methods, in particular whenever handcrafted features extracted from accelerometer signal are considered. Here, we propose to employ the deep learning paradigm in order to learn discriminating features from multi-sensor accelerometer signals. Our results provide preliminary evidence that feature learning and transfer learning embedded in the deep architecture achieve higher accurate SMM detectors in longitudinal scenarios.
△ Less
Submitted 7 June, 2016; v1 submitted 5 November, 2015;
originally announced November 2015.
-
Entropy Dynamics of Community Alignment in the Italian Parliament Time-Dependent Network
Authors:
Gabriele Lami,
Marco Cristoforetti,
Giuseppe Jurman,
Cesare Furlanello,
Tommaso Furlanello
Abstract:
Complex institutions are typically characterized by meso-scale structures which are fundamental for the successful coordination of multiple agents. Here we introduce a framework to study the temporal dynamics of the node-community relationship based on the concept of community alignment, a measure derived from the modularity matrix that defines the alignment of a node with respect to the core of i…
▽ More
Complex institutions are typically characterized by meso-scale structures which are fundamental for the successful coordination of multiple agents. Here we introduce a framework to study the temporal dynamics of the node-community relationship based on the concept of community alignment, a measure derived from the modularity matrix that defines the alignment of a node with respect to the core of its community. The framework is applied to the 16th legislature of the Italian Parliament to study the dynamic relationship in voting behavior between Members of the Parliament (MPs) and their political parties. As a novel contribution, we introduce two entropy-based measures that capture politically interesting dynamics: the group alignment entropy (over a single snapshot), and the node alignment entropy (over multiple snapshots). We show that significant meso-scale changes in the time-dependent network structures can be detected by a combination of the two measures. We observe a steady growth of the group alignment entropy after a major internal conflict in the ruling majority and a different distribution of nodes alignment entropy after the government transition.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
A combinatorial model of malware diffusion via Bluetooth connections
Authors:
Stefano Merler,
Giuseppe Jurman
Abstract:
We outline here the mathematical expression of a diffusion model for cellphones malware transmitted through Bluetooth channels. In particular, we provide the deterministic formula underlying the proposed infection model, in its equivalent recursive (simple but computationally heavy) and closed form (more complex but efficiently computable) expression.
We outline here the mathematical expression of a diffusion model for cellphones malware transmitted through Bluetooth channels. In particular, we provide the deterministic formula underlying the proposed infection model, in its equivalent recursive (simple but computationally heavy) and closed form (more complex but efficiently computable) expression.
△ Less
Submitted 18 February, 2013; v1 submitted 7 January, 2012;
originally announced April 2012.
-
mlpy: Machine Learning Python
Authors:
Davide Albanese,
Roberto Visintainer,
Stefano Merler,
Samantha Riccadonna,
Giuseppe Jurman,
Cesare Furlanello
Abstract:
mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Pytho…
▽ More
mlpy is a Python Open Source Machine Learning library built on top of NumPy/SciPy and the GNU Scientific Libraries. mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is distributed under GPL3 at the website http://mlpy.fbk.eu.
△ Less
Submitted 1 March, 2012; v1 submitted 29 February, 2012;
originally announced February 2012.
-
The HIM glocal metric and kernel for network comparison and classification
Authors:
Giuseppe Jurman,
Roberto Visintainer,
Michele Filosi,
Samantha Riccadonna,
Cesare Furlanello
Abstract:
Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored ad-hoc for the investigated problem. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) d…
▽ More
Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored ad-hoc for the investigated problem. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming distance and the global spectral Ipsen-Mikhailov distance so to overcome the drawbacks affecting the two components separately. Building then the HIM kernel function derived from the HIM distance it is possible to move from network comparison to network classification via the Support Vector Machine (SVM) algorithm. Applications of HIM distance and HIM kernel in computational biology and social networks science demonstrate the effectiveness of the proposed functions as a general purpose solution.
△ Less
Submitted 8 November, 2013; v1 submitted 13 January, 2012;
originally announced January 2012.