-
Population Power Curves in ASCA with Permutation Testing
Authors:
Jose Camacho,
Michael Sorochan Armstrong
Abstract:
In this paper, we revisit the Power Curves in ANOVA Simultaneous Component Analysis (ASCA) based on permutation testing, and introduce the Population Curves derived from population parameters describing the relative effect among factors and interactions. We distinguish Relative from Absolute Population Curves, where the former represent statistical power in terms of the normalized effect size betw…
▽ More
In this paper, we revisit the Power Curves in ANOVA Simultaneous Component Analysis (ASCA) based on permutation testing, and introduce the Population Curves derived from population parameters describing the relative effect among factors and interactions. We distinguish Relative from Absolute Population Curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Relative Population Curves are useful to find the optimal ASCA model (e.g., fixed/random factors, crossed/nested relationships, interactions, the test statistic, transformations, etc.) for the analysis of an experimental design at hand. Absolute Population Curves are useful to determine the sample size and the optimal number of levels for each factor during the planning phase on an experiment. We illustrate both types of curves through simulation. We expect Population Curves to become the go-to approach to plan the optimal analysis pipeline and the required sample size in an omics study analyzed with ASCA.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
A Cybersecurity Risk Analysis Framework for Systems with Artificial Intelligence Components
Authors:
Jose Manuel Camacho,
Aitor Couce-Vieira,
David Arroyo,
David Rios Insua
Abstract:
The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demands a better understanding and implementation of novel risk analysis approaches to evaluate systems with Artificial Intelligence components. This paper provides a cybersecurity risk analysis framework that can help assessing such systems. We use an i…
▽ More
The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demands a better understanding and implementation of novel risk analysis approaches to evaluate systems with Artificial Intelligence components. This paper provides a cybersecurity risk analysis framework that can help assessing such systems. We use an illustrative example concerning automated driving systems.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Modelling stellar activity with Gaussian process regression networks
Authors:
J. D. Camacho,
J. P. Faria,
P. T. P. Viana
Abstract:
Stellar photospheric activity is known to limit the detection and characterisation of extra-solar planets. In particular, the study of Earth-like planets around Sun-like stars requires data analysis methods that can accurately model the stellar activity phenomena affecting radial velocity (RV) measurements. Gaussian Process Regression Networks (GPRNs) offer a principled approach to the analysis of…
▽ More
Stellar photospheric activity is known to limit the detection and characterisation of extra-solar planets. In particular, the study of Earth-like planets around Sun-like stars requires data analysis methods that can accurately model the stellar activity phenomena affecting radial velocity (RV) measurements. Gaussian Process Regression Networks (GPRNs) offer a principled approach to the analysis of simultaneous time-series, combining the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian Processes. Using HARPS-N solar spectroscopic observations encompassing three years, we demonstrate that this framework is capable of jointly modelling RV data and traditional stellar activity indicators. Although we consider only the simplest GPRN configuration, we are able to describe the behaviour of solar RV data at least as accurately as previously published methods. We confirm the correlation between the RV and stellar activity time series reaches a maximum at separations of a few days, and find evidence of non-stationary behaviour in the time series, associated with an approaching solar activity minimum.
△ Less
Submitted 15 December, 2022; v1 submitted 13 May, 2022;
originally announced May 2022.
-
A Bayesian network model for predicting cardiovascular risk
Authors:
J. M. Ordovas,
D. Rios Insua,
A. Santos-Lozano,
A. Lucia,
A. Torres,
A. Kosgodagan,
J. M. Camacho
Abstract:
We propose a Bayesian network model to make inferences and predictions about cardiovascular risk. Both the structure and the probability tables in the underlying model are built using a large dataset collected in Spain from annual work health assessments, with uncertainty characterized through posterior distributions. We illustrate its use for public health practice, policy and research purposes.…
▽ More
We propose a Bayesian network model to make inferences and predictions about cardiovascular risk. Both the structure and the probability tables in the underlying model are built using a large dataset collected in Spain from annual work health assessments, with uncertainty characterized through posterior distributions. We illustrate its use for public health practice, policy and research purposes. A freely available version of the software is included in an Appendix.
△ Less
Submitted 31 March, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
MSNM-Sensor: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems
Authors:
Roberto Magán-Carrión,
José Camacho,
Gabriel Maciá-Fernández,
Ángel Ruíz-Zafra
Abstract:
Technology evolves quickly. Low-cost and ready-to-connect devices are designed to provide new services and applications. Smart grids or smart healthcare systems are some examples of these applications, all of which are in the context of smart cities. In this total-connectivity scenario, some security issues arise since the larger the number of connected devices is, the greater the surface attack d…
▽ More
Technology evolves quickly. Low-cost and ready-to-connect devices are designed to provide new services and applications. Smart grids or smart healthcare systems are some examples of these applications, all of which are in the context of smart cities. In this total-connectivity scenario, some security issues arise since the larger the number of connected devices is, the greater the surface attack dimension. In this way, new solutions for monitoring and detecting security events are needed to address new challenges brought about by this scenario, among others, the large number of devices to monitor, the large amount of data to manage and the real-time requirement to provide quick security event detection and, consequently, quick response to attacks. In this work, a practical and ready-to-use tool for monitoring and detecting security events in these environments is developed and introduced. The tool is based on the Multivariate Statistical Network Monitoring (MSNM) methodology for monitoring and anomaly detection and we call it MSNM-Sensor. Although it is in its early development stages, experimental results based on the detection of well-known attacks in hierarchical network systems prove the suitability of this tool for more complex scenarios, such as those found in smart cities or IoT ecosystems.
△ Less
Submitted 5 December, 2021; v1 submitted 31 July, 2019;
originally announced July 2019.
-
All Sparse PCA Models Are Wrong, But Some Are Useful. Part I: Computation of Scores, Residuals and Explained Variance
Authors:
J. Camacho,
A. K. Smilde,
E. Saccenti,
J. A. Westerhuis
Abstract:
Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may…
▽ More
Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA cannot directly be applied to sPCA models. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this problem.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Interpretable Feature Learning in Multivariate Big Data Analysis for Network Monitoring
Authors:
José Camacho,
Katarzyna Wasielewska,
Rasmus Bro,
David Kotz
Abstract:
There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed…
▽ More
There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and largest Wi-Fi trace known to date.
△ Less
Submitted 1 March, 2024; v1 submitted 5 July, 2019;
originally announced July 2019.
-
Cross-product Penalized Component Analysis (XCAN)
Authors:
José Camacho,
Evrim Acar,
Morten A. Rasmussen,
Rasmus Bro
Abstract:
Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Princi…
▽ More
Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Principal Component Analysis (SPCA) framework based on the LASSO, (ii) extensions of SPCA to constrain both modes of the factorization, like co-clustering or the Penalized Matrix Decomposition (PMD), and (iii) the Group-wise Principal Component Analysis (GPCA) method. The result is a flexible modeling approach that can be used for data exploration in a large variety of problems. We demonstrate its use with applications from different disciplines.
△ Less
Submitted 28 June, 2019;
originally announced July 2019.
-
Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle
Authors:
José Camacho,
José Manuel García-Giménez,
Noemí Marta Fuentes-García,
Gabriel Maciá-Fernández
Abstract:
The research literature on cybersecurity incident detection & response is very rich in automatic detection methodologies, in particular those based on the anomaly detection paradigm. However, very little attention has been devoted to the diagnosis ability of the methods, aimed to provide useful information on the causes of a given detected anomaly. This information is of utmost importance for the…
▽ More
The research literature on cybersecurity incident detection & response is very rich in automatic detection methodologies, in particular those based on the anomaly detection paradigm. However, very little attention has been devoted to the diagnosis ability of the methods, aimed to provide useful information on the causes of a given detected anomaly. This information is of utmost importance for the security team to reduce the time from detection to response. In this paper, we present Multivariate Big Data Analysis (MBDA), a complete intrusion detection approach based on 5 steps to effectively handle massive amounts of disparate data sources. The approach has been designed to deal with the main characteristics of Big Data, that is, the high volume, velocity and variety. The core of the approach is the Multivariate Statistical Network Monitoring (MSNM) technique proposed in a recent paper. Unlike in state of the art machine learning methodologies applied to the intrusion detection problem, when an anomaly is identified in MBDA the output of the system includes the detail of the logs of raw information associated to this anomaly, so that the security team can use this information to elucidate its root causes. MBDA is based in two open software packages available in Github: the MEDA Toolbox and the FCParser. We illustrate our approach with two case studies. The first one demonstrates the application of MBDA to semistructured sources of information, using the data from the VAST 2012 mini challenge 2. This complete case study is supplied in a virtual machine available for download. In the second case study we show the Big Data capabilities of the approach in data collected from a real network with labeled attacks.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Coexistence of cooperators and defectors in well mixed populations mediated by limiting resources
Authors:
Rubén J. Requejo,
Juan Camacho
Abstract:
Traditionally, resource limitation in evolutionary game theory is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma, and to allow for the stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a self-organizing process t…
▽ More
Traditionally, resource limitation in evolutionary game theory is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma, and to allow for the stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a self-organizing process that turns the interaction payoff matrix into evolutionary neutral, and represents a resource-based control mechanism preventing the spread of defectors. To our knowledge, this is the first example of coexistence in well-mixed populations with a game structure different from a snowdrift game.
△ Less
Submitted 25 October, 2012; v1 submitted 10 February, 2011;
originally announced February 2011.