-
Improving Noise Robustness through Abstractions and its Impact on Machine Learning
Authors:
Alfredo Ibias,
Karol Capala,
Varun Ravi Varma,
Anna Drozdz,
Jose Sousa
Abstract:
Noise is a fundamental problem in learning theory with huge effects in the application of Machine Learning (ML) methods, due to real world data tendency to be noisy. Additionally, introduction of malicious noise can make ML methods fail critically, as is the case with adversarial attacks. Thus, finding and develo** alternatives to improve robustness to noise is a fundamental problem in ML. In th…
▽ More
Noise is a fundamental problem in learning theory with huge effects in the application of Machine Learning (ML) methods, due to real world data tendency to be noisy. Additionally, introduction of malicious noise can make ML methods fail critically, as is the case with adversarial attacks. Thus, finding and develo** alternatives to improve robustness to noise is a fundamental problem in ML. In this paper, we propose a method to deal with noise: mitigating its effect through the use of data abstractions. The goal is to reduce the effect of noise over the model's performance through the loss of information produced by the abstraction. However, this information loss comes with a cost: it can result in an accuracy reduction due to the missing information. First, we explored multiple methodologies to create abstractions, using the training dataset, for the specific case of numerical data and binary classification tasks. We also tested how these abstractions can affect robustness to noise with several experiments that explore the robustness of an Artificial Neural Network to noise when trained using raw data \emph{vs} when trained using abstracted data. The results clearly show that using abstractions is a viable approach for develo** noise robust ML methods.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures
Authors:
Luca Gherardini,
Varun Ravi Varma,
Karol Capala,
Roger Woods,
Jose Sousa
Abstract:
The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for develo** solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improve…
▽ More
The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for develo** solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Scalar on time-by-distribution regression and its application for modelling associations between daily-living physical activity and cognitive functions in Alzheimer's Disease
Authors:
Rahul Ghosal,
Vijay R. Varma,
Dmitri Volfson,
Jacek Urbanek,
Jeffrey M. Hausdorff,
Amber Watts,
Vadim Zipunnikov
Abstract:
Wearable data is a rich source of information that can provide deeper understanding of links between human behaviours and human health. Existing modelling approaches use wearable data summarized at subject level via scalar summaries using regression techniques, temporal (time-of-day) curves using functional data analysis (FDA), and distributions using distributional data analysis (DDA). We propose…
▽ More
Wearable data is a rich source of information that can provide deeper understanding of links between human behaviours and human health. Existing modelling approaches use wearable data summarized at subject level via scalar summaries using regression techniques, temporal (time-of-day) curves using functional data analysis (FDA), and distributions using distributional data analysis (DDA). We propose to capture temporally local distributional information in wearable data using subject-specific time-by-distribution (TD) data objects. Specifically, we propose scalar on time-by-distribution regression (SOTDR) to model associations between scalar response of interest such as health outcomes or disease status and TD predictors. We show that TD data objects can be parsimoniously represented via a collection of time-varying L-moments that capture distributional changes over the time-of-day. The proposed method is applied to the accelerometry study of mild Alzheimer's disease (AD). Mild AD is found to be significantly associated with reduced maximal level of physical activity, particularly during morning hours. It is also demonstrated that TD predictors attain much stronger associations with clinical cognitive scales of attention, verbal memory, and executive function when compared to predictors summarized via scalar total activity counts, temporal functional curves, and quantile functions. Taken together, the present results suggest that the SOTDR analysis provides novel insights into cognitive function and AD.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Distributional data analysis via quantile functions and its application to modelling digital biomarkers of gait in Alzheimer's Disease
Authors:
Rahul Ghosal,
Vijay R. Varma,
Dmitri Volfson,
Inbar Hillel,
Jacek Urbanek,
Jeffrey M. Hausdorff,
Amber Watts,
Vadim Zipunnikov
Abstract:
With the advent of continuous health monitoring with wearable devices, users now generate their unique streams of continuous data such as minute-level step counts or heartbeats. Summarizing these streams via scalar summaries often ignores the distributional nature of wearable data and almost unavoidably leads to the loss of critical information. We propose to capture the distributional nature of w…
▽ More
With the advent of continuous health monitoring with wearable devices, users now generate their unique streams of continuous data such as minute-level step counts or heartbeats. Summarizing these streams via scalar summaries often ignores the distributional nature of wearable data and almost unavoidably leads to the loss of critical information. We propose to capture the distributional nature of wearable data via user-specific quantile functions (QF) and use these QFs as predictors in scalar-on-quantile-function-regression (SOQFR). As an alternative approach, we also propose to represent QFs via user-specific L-moments, robust rank-based analogs of traditional moments, and use L-moments as predictors in SOQFR (SOQFR-L). These two approaches provide two mutually consistent interpretations: in terms of quantile levels by SOQFR and in terms of L-moments by SOQFR-L. We also demonstrate how to deal with multi-modal distributional data via Joint and Individual Variation Explained (JIVE) using L-moments. The proposed methods are illustrated in a study of association of digital gait biomarkers with cognitive function in Alzheimer's disease (AD). Our analysis shows that the proposed methods demonstrate higher predictive performance and attain much stronger associations with clinical cognitive scales compared to simple distributional summaries.
△ Less
Submitted 25 October, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.