Search | arXiv e-print repository

Statistical Agnostic Regression: a machine learning method to validate regression models

Authors: Juan M Gorriz, J. Ramirez, F. Segovia, F. J. Martinez-Murcia, C. Jiménez-Mesa, J. Suckling

Abstract: Regression analysis is a central topic in statistical modeling, aiming to estimate the relationships between a dependent variable, commonly referred to as the response variable, and one or more independent variables, i.e., explanatory variables. Linear regression is by far the most popular method for performing this task in several fields of research, such as prediction, forecasting, or causal inf… ▽ More Regression analysis is a central topic in statistical modeling, aiming to estimate the relationships between a dependent variable, commonly referred to as the response variable, and one or more independent variables, i.e., explanatory variables. Linear regression is by far the most popular method for performing this task in several fields of research, such as prediction, forecasting, or causal inference. Beyond various classical methods to solve linear regression problems, such as Ordinary Least Squares, Ridge, or Lasso regressions - which are often the foundation for more advanced machine learning (ML) techniques - the latter have been successfully applied in this scenario without a formal definition of statistical significance. At most, permutation or classical analyses based on empirical measures (e.g., residuals or accuracy) have been conducted to reflect the greater ability of ML estimations for detection. In this paper, we introduce a method, named Statistical Agnostic Regression (SAR), for evaluating the statistical significance of an ML-based linear regression based on concentration inequalities of the actual risk using the analysis of the worst case. To achieve this goal, similar to the classification problem, we define a threshold to establish that there is sufficient evidence with a probability of at least 1-eta to conclude that there is a linear relationship in the population between the explanatory (feature) and the response (label) variables. Simulations in only two dimensions demonstrate the ability of the proposed agnostic test to provide a similar analysis of variance given by the classical $F$ test for the slope parameter. △ Less

Submitted 22 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 17 pages, 15 figures

arXiv:2311.13876 [pdf, other]

doi 10.1142/S0129065720500379

EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia

Authors: Francisco Jesus Martinez-Murcia, Andrés Ortiz, Juan Manuel Górriz, Javier Ramírez, Pedro Javier Lopez-Perez, Miguel López-Zamora, Juan Luis Luque

Abstract: The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12… ▽ More The Temporal Sampling Framework (TSF) theorizes that the characteristic phonological difficulties of dyslexia are caused by an atypical oscillatory sampling at one or more temporal rates. The LEEDUCA study conducted a series of Electroencephalography (EEG) experiments on children listening to amplitude modulated (AM) noise with slow-rythmic prosodic (0.5-1 Hz), syllabic (4-8 Hz) or the phoneme (12-40 Hz) rates, aimed at detecting differences in perception of oscillatory sampling that could be associated with dyslexia. The purpose of this work is to check whether these differences exist and how they are related to children's performance in different language and cognitive tasks commonly used to detect dyslexia. To this purpose, temporal and spectral inter-channel EEG connectivity was estimated, and a denoising autoencoder (DAE) was trained to learn a low-dimensional representation of the connectivity matrices. This representation was studied via correlation and classification analysis, which revealed ability in detecting dyslexic subjects with an accuracy higher than 0.8, and balanced accuracy around 0.7. Some features of the DAE representation were significantly correlated ($p<0.005$) with children's performance in language and cognitive tasks of the phonological hypothesis category such as phonological awareness and rapid symbolic naming, as well as reading efficiency and reading comprehension. Finally, a deeper analysis of the adjacency matrix revealed a reduced bilateral connection between electrodes of the temporal lobe (roughly the primary auditory cortex) in DD subjects, as well as an increased connectivity of the F7 electrode, placed roughly on Broca's area. These results pave the way for a complementary assessment of dyslexia using more objective methodologies such as EEG. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 19 pages, 6 figures

Journal ref: INT J NEURAL SYST 30 (7), 2020, 2050037

arXiv:2311.12561 [pdf, ps, other]

doi 10.1142/S0129065718500351

Convolutional Neural Networks for Neuroimaging in Parkinson's Disease: Is Preprocessing Needed?

Authors: Francisco J. Martinez-Murcia, Juan M. Górriz, Javier Ramírez, Andrés Ortiz

Abstract: Spatial and intensity normalization are nowadays a prerequisite for neuroimaging analysis. Influenced by voxel-wise and other univariate comparisons, where these corrections are key, they are commonly applied to any type of analysis and imaging modalities. Nuclear imaging modalities such as PET-FDG or FP-CIT SPECT, a common modality used in Parkinson's Disease diagnosis, are especially dependent o… ▽ More Spatial and intensity normalization are nowadays a prerequisite for neuroimaging analysis. Influenced by voxel-wise and other univariate comparisons, where these corrections are key, they are commonly applied to any type of analysis and imaging modalities. Nuclear imaging modalities such as PET-FDG or FP-CIT SPECT, a common modality used in Parkinson's Disease diagnosis, are especially dependent on intensity normalization. However, these steps are computationally expensive and furthermore, they may introduce deformations in the images, altering the information contained in them. Convolutional Neural Networks (CNNs), for their part, introduce position invariance to pattern recognition, and have been proven to classify objects regardless of their orientation, size, angle, etc. Therefore, a question arises: how well can CNNs account for spatial and intensity differences when analysing nuclear brain imaging? Are spatial and intensity normalization still needed? To answer this question, we have trained four different CNN models based on well-established architectures, using or not different spatial and intensity normalization preprocessing. The results show that a sufficiently complex model such as our three-dimensional version of the ALEXNET can effectively account for spatial differences, achieving a diagnosis accuracy of 94.1% with an area under the ROC curve of 0.984. The visualization of the differences via saliency maps shows that these models are correctly finding patterns that match those found in the literature, without the need of applying any complex spatial normalization procedure. However, the intensity normalization -- and its type -- is revealed as very influential in the results and accuracy of the trained model, and therefore must be well accounted. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 19 pages, 7 figures

Journal ref: INT J NEURAL SYST 28 (10), 2018, 1850035

arXiv:2305.07038 [pdf, other]

Revealing Patterns of Symptomatology in Parkinson's Disease: A Latent Space Analysis with 3D Convolutional Autoencoders

Authors: E. Delgado de las Heras, F. J. Martinez-Murcia, I. A. Illán, C. Jiménez-Mesa, D. Castillo-Barnes, J. Ramírez, J. M. Górriz

Abstract: This work proposes the use of 3D convolutional variational autoencoders (CVAEs) to trace the changes and symptomatology produced by neurodegeneration in Parkinson's disease (PD). In this work, we present a novel approach to detect and quantify changes in dopamine transporter (DaT) concentration and its spatial patterns using 3D CVAEs on Ioflupane (FPCIT) imaging. Our approach leverages the power o… ▽ More This work proposes the use of 3D convolutional variational autoencoders (CVAEs) to trace the changes and symptomatology produced by neurodegeneration in Parkinson's disease (PD). In this work, we present a novel approach to detect and quantify changes in dopamine transporter (DaT) concentration and its spatial patterns using 3D CVAEs on Ioflupane (FPCIT) imaging. Our approach leverages the power of deep learning to learn a low-dimensional representation of the brain imaging data, which then is linked to different symptom categories using regression algorithms. We demonstrate the effectiveness of our approach on a dataset of PD patients and healthy controls, and show that general symptomatology (UPDRS) is linked to a d-dimensional decomposition via the CVAE with R2>0.25. Our work shows the potential of representation learning not only in early diagnosis but in understanding neurodegeneration processes and symptomatology. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted at 2023 ASPAI Conference

arXiv:2106.14675 [pdf, other]

doi 10.1016/j.knosys.2021.108098

Complex network modelling of EEG band coupling in dyslexia: an exploratory analysis of auditory processing and diagnosis

Authors: Nicolás J. Gallego-Molina, Andrés Ortiz, Francisco J. Martínez-Murcia, Marco Formoso, Almudena Giménez

Abstract: Complex network analysis has an increasing relevance in the study of neurological disorders, enhancing the knowledge of brain's structural and functional organization. Network structure and efficiency reveal different brain states along with different ways of processing the information. This work is structured around the exploratory analysis of the brain processes involved in low-level auditory pr… ▽ More Complex network analysis has an increasing relevance in the study of neurological disorders, enhancing the knowledge of brain's structural and functional organization. Network structure and efficiency reveal different brain states along with different ways of processing the information. This work is structured around the exploratory analysis of the brain processes involved in low-level auditory processing. A complex network analysis was performed on the basis of brain coupling obtained from Electroencephalography (EEG) data, while different auditory stimuli were presented to the subjects. This coupling is inferred from the Phase-Amplitude coupling (PAC) from different EEG electrodes to explore differences between controls and dyslexic subjects. Coupling data allows the construction of a graph, and then, graph theory is used to study the characteristics of the complex networks throughout time for controls and dyslexics. This results in a set of metrics including clustering coefficient, path length and small-worldness. From this, different characteristics linked to the temporal evolution of networks and coupling are pointed out for dyslexics. Our study revealed patterns related to Dyslexia as losing the small-world topology. Finally, these graph-based features are used to classify between controls and dyslexic subjects by means of a Support Vector Machine (SVM). △ Less

Submitted 21 July, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:2104.05991 [pdf, other]

doi 10.1007/978-3-030-85099-9_4

Temporal EigenPAC for dyslexia diagnosis

Authors: Nicolás Gallego-Molina, Marco Formoso, Andrés Ortiz, Francisco J. Martínez-Murcia, Juan L. Luque

Abstract: Electroencephalography signals allow to explore the functional activity of the brain cortex in a non-invasive way. However, the analysis of these signals is not straightforward due to the presence of different artifacts and the very low signal-to-noise ratio. Cross-Frequency Coupling (CFC) methods provide a way to extract information from EEG, related to the synchronization among frequency bands.… ▽ More Electroencephalography signals allow to explore the functional activity of the brain cortex in a non-invasive way. However, the analysis of these signals is not straightforward due to the presence of different artifacts and the very low signal-to-noise ratio. Cross-Frequency Coupling (CFC) methods provide a way to extract information from EEG, related to the synchronization among frequency bands. However, CFC methods are usually applied in a local way, computing the interaction between phase and amplitude at the same electrode. In this work we show a method to compute PAC features among electrodes to study the functional connectivity. Moreover, this has been applied jointly with Principal Component Analysis to explore patterns related to Dyslexia in 7-years-old children. The developed methodology reveals the temporal evolution of PAC-based connectivity. Directions of greatest variance computed by PCA are called eigenPACs here, since they resemble the classical \textit{eigenfaces} representation. The projection of PAC data onto the eigenPACs provide a set of features that has demonstrates their discriminative capability, specifically in the Beta-Gamma bands. △ Less

Submitted 13 April, 2021; originally announced April 2021.

arXiv:2104.05497 [pdf, ps, other]

Modelling Brain Connectivity Networks by Graph Embedding for Dyslexia Diagnosis

Authors: Marco A. Formoso, Andrés Ortiz, Francisco J. Martínez-Murcia, Nicolás Gallego-Molina, Juan L. Luque

Abstract: Several methods have been developed to extract information from electroencephalograms (EEG). One of them is Phase-Amplitude Coupling (PAC) which is a type of Cross-Frequency Coupling (CFC) method, consisting in measure the synchronization of phase and amplitude for the different EEG bands and electrodes. This provides information regarding brain areas that are synchronously activated, and eventual… ▽ More Several methods have been developed to extract information from electroencephalograms (EEG). One of them is Phase-Amplitude Coupling (PAC) which is a type of Cross-Frequency Coupling (CFC) method, consisting in measure the synchronization of phase and amplitude for the different EEG bands and electrodes. This provides information regarding brain areas that are synchronously activated, and eventually, a marker of functional connectivity between these areas. In this work, intra and inter electrode PAC is computed obtaining the relationship among different electrodes used in EEG. The connectivity information is then treated as a graph in which the different nodes are the electrodes and the edges PAC values between them. These structures are embedded to create a feature vector that can be further used to classify multichannel EEG samples. The proposed method has been applied to classified EEG samples acquired using specific auditory stimuli in a task designed for dyslexia disorder diagnosis in seven years old children EEG's. The proposed method provides AUC values up to 0.73 and allows selecting the most discriminant electrodes and EEG bands. △ Less

Submitted 12 April, 2021; originally announced April 2021.

arXiv:2103.02961 [pdf, other]

Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images

Authors: Juan E. Arco, Andrés Ortiz, Javier Ramírez, Francisco J. Martínez-Murcia, Yu-Dong Zhang, Jordi Broncano, M. Álvaro Berbís, Javier Royuela-del-Val, Antonio Luna, Juan M. Górriz

Abstract: The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT)… ▽ More The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment. △ Less

Submitted 26 September, 2022; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 15 pages, 9 figures

arXiv:2011.14894 [pdf, other]

Uncertainty-driven ensembles of deep architectures for multiclass classification. Application to COVID-19 diagnosis in chest X-ray images

Authors: Juan E. Arco, A. Ortiz, J. Ramirez, F. J. Martinez-Murcia, Yu-Dong Zhang, Juan M. Gorriz

Abstract: Respiratory diseases kill million of people each year. Diagnosis of these pathologies is a manual, time-consuming process that has inter and intra-observer variability, delaying diagnosis and treatment. The recent COVID-19 pandemic has demonstrated the need of develo** systems to automatize the diagnosis of pneumonia, whilst Convolutional Neural Network (CNNs) have proved to be an excellent opti… ▽ More Respiratory diseases kill million of people each year. Diagnosis of these pathologies is a manual, time-consuming process that has inter and intra-observer variability, delaying diagnosis and treatment. The recent COVID-19 pandemic has demonstrated the need of develo** systems to automatize the diagnosis of pneumonia, whilst Convolutional Neural Network (CNNs) have proved to be an excellent option for the automatic classification of medical images. However, given the need of providing a confidence classification in this context it is crucial to quantify the reliability of the model's predictions. In this work, we propose a multi-level ensemble classification system based on a Bayesian Deep Learning approach in order to maximize performance while quantifying the uncertainty of each classification decision. This tool combines the information extracted from different architectures by weighting their results according to the uncertainty of their predictions. Performance of the Bayesian network is evaluated in a real scenario where simultaneously differentiating between four different pathologies: control vs bacterial pneumonia vs viral pneumonia vs COVID-19 pneumonia. A three-level decision tree is employed to divide the 4-class classification into three binary classifications, yielding an accuracy of 98.06% and overcoming the results obtained by recent literature. The reduced preprocessing needed for obtaining this high performance, in addition to the information provided about the reliability of the predictions evidence the applicability of the system to be used as an aid for clinicians. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: 1 Table, 7 Figures

arXiv:2010.13731 [pdf, other]

doi 10.1007/978-3-030-61705-9_54

Dyslexia detection from EEG signals using SSA component correlation and Convolutional Neural Networks

Authors: Andrés Ortiz, Francisco J. Martinez-Murcia, Marco A. Formoso, Juan Luis Luque, Auxiliadora Sánchez

Abstract: Objective dyslexia diagnosis is not a straighforward task since it is traditionally performed by means of the intepretation of different behavioural tests. Moreover, these tests are only applicable to readers. This way, early diagnosis requires the use of specific tasks not only related to reading. Thus, the use of Electroencephalography (EEG) constitutes an alternative for an objective and early… ▽ More Objective dyslexia diagnosis is not a straighforward task since it is traditionally performed by means of the intepretation of different behavioural tests. Moreover, these tests are only applicable to readers. This way, early diagnosis requires the use of specific tasks not only related to reading. Thus, the use of Electroencephalography (EEG) constitutes an alternative for an objective and early diagnosis that can be used with pre-readers. In this way, the extraction of relevant features in EEG signals results crucial for classification. However, the identification of the most relevant features is not straighforward, and predefined statistics in the time or frequency domain are not always discriminant enough. On the other hand, classical processing of EEG signals based on extracting EEG bands frequency descriptors, usually make some assumptions on the raw signals that could cause indormation loosing. In this work we propose an alternative for analysis in the frequency domain based on Singluar Spectrum Analysis (SSA) to split the raw signal into components representing different oscillatory modes. Moreover, correlation matrices obtained for each component among EEG channels are classfied using a Convolutional Neural network. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 11 pages, 7 figures. Submitted to conference

Journal ref: HAIS 2020: Hybrid Artificial Intelligent Systems. LNCS 12344. pp 655-664

arXiv:2002.02184 [pdf, ps, other]

doi 10.1007/978-3-030-61705-9_51

A Neural Approach to Ordinal Regression for the Preventive Assessment of Developmental Dyslexia

Authors: F. J. Martinez-Murcia, A. Ortiz, Marco A. Formoso, M. Lopez-Zamora, J. L. Luque, A. Giménez

Abstract: Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that aff… ▽ More Developmental Dyslexia (DD) is a learning disability related to the acquisition of reading skills that affects about 5% of the population. DD can have an enormous impact on the intellectual and personal development of affected children, so early detection is key to implementing preventive strategies for teaching language. Research has shown that there may be biological underpinnings to DD that affect phoneme processing, and hence these symptoms may be identifiable before reading ability is acquired, allowing for early intervention. In this paper we propose a new methodology to assess the risk of DD before students learn to read. For this purpose, we propose a mixed neural model that calculates risk levels of dyslexia from tests that can be completed at the age of 5 years. Our method first trains an auto-encoder, and then combines the trained encoder with an optimized ordinal regression neural network devised to ensure consistency of predictions. Our experiments show that the system is able to detect unaffected subjects two years before it can assess the risk of DD based mainly on phonological processing, giving a specificity of 0.969 and a correct rate of more than 0.92. In addition, the trained encoder can be used to transform test results into an interpretable subject spatial distribution that facilitates risk assessment and validates methodology. △ Less

Submitted 20 October, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: 12 pages, 4 figures

Showing 1–11 of 11 results for author: Martínez-Murcia, F J