Search | arXiv e-print repository

Population Power Curves in ASCA with Permutation Testing

Authors: Jose Camacho, Michael Sorochan Armstrong

Abstract: In this paper, we revisit the Power Curves in ANOVA Simultaneous Component Analysis (ASCA) based on permutation testing, and introduce the Population Curves derived from population parameters describing the relative effect among factors and interactions. We distinguish Relative from Absolute Population Curves, where the former represent statistical power in terms of the normalized effect size betw… ▽ More In this paper, we revisit the Power Curves in ANOVA Simultaneous Component Analysis (ASCA) based on permutation testing, and introduce the Population Curves derived from population parameters describing the relative effect among factors and interactions. We distinguish Relative from Absolute Population Curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Relative Population Curves are useful to find the optimal ASCA model (e.g., fixed/random factors, crossed/nested relationships, interactions, the test statistic, transformations, etc.) for the analysis of an experimental design at hand. Absolute Population Curves are useful to determine the sample size and the optimal number of levels for each factor during the planning phase on an experiment. We illustrate both types of curves through simulation. We expect Population Curves to become the go-to approach to plan the optimal analysis pipeline and the required sample size in an omics study analyzed with ASCA. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Submitted to Journal of Chemometrics, 2024

arXiv:2401.01630 [pdf, other]

A Cybersecurity Risk Analysis Framework for Systems with Artificial Intelligence Components

Authors: Jose Manuel Camacho, Aitor Couce-Vieira, David Arroyo, David Rios Insua

Abstract: The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demands a better understanding and implementation of novel risk analysis approaches to evaluate systems with Artificial Intelligence components. This paper provides a cybersecurity risk analysis framework that can help assessing such systems. We use an i… ▽ More The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demands a better understanding and implementation of novel risk analysis approaches to evaluate systems with Artificial Intelligence components. This paper provides a cybersecurity risk analysis framework that can help assessing such systems. We use an illustrative example concerning automated driving systems. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 54 pages, 18 tables, 6 figures

arXiv:2205.06627 [pdf, other]

doi 10.1093/mnras/stac3727

Modelling stellar activity with Gaussian process regression networks

Authors: J. D. Camacho, J. P. Faria, P. T. P. Viana

Abstract: Stellar photospheric activity is known to limit the detection and characterisation of extra-solar planets. In particular, the study of Earth-like planets around Sun-like stars requires data analysis methods that can accurately model the stellar activity phenomena affecting radial velocity (RV) measurements. Gaussian Process Regression Networks (GPRNs) offer a principled approach to the analysis of… ▽ More Stellar photospheric activity is known to limit the detection and characterisation of extra-solar planets. In particular, the study of Earth-like planets around Sun-like stars requires data analysis methods that can accurately model the stellar activity phenomena affecting radial velocity (RV) measurements. Gaussian Process Regression Networks (GPRNs) offer a principled approach to the analysis of simultaneous time-series, combining the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian Processes. Using HARPS-N solar spectroscopic observations encompassing three years, we demonstrate that this framework is capable of jointly modelling RV data and traditional stellar activity indicators. Although we consider only the simplest GPRN configuration, we are able to describe the behaviour of solar RV data at least as accurately as previously published methods. We confirm the correlation between the RV and stellar activity time series reaches a maximum at separations of a few days, and find evidence of non-stationary behaviour in the time series, associated with an approaching solar activity minimum. △ Less

Submitted 15 December, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: 30 pages, 23 figures, accepted for publication on MNRAS

arXiv:2112.13963 [pdf, other]

A Bayesian network model for predicting cardiovascular risk

Authors: J. M. Ordovas, D. Rios Insua, A. Santos-Lozano, A. Lucia, A. Torres, A. Kosgodagan, J. M. Camacho

Abstract: We propose a Bayesian network model to make inferences and predictions about cardiovascular risk. Both the structure and the probability tables in the underlying model are built using a large dataset collected in Spain from annual work health assessments, with uncertainty characterized through posterior distributions. We illustrate its use for public health practice, policy and research purposes.… ▽ More We propose a Bayesian network model to make inferences and predictions about cardiovascular risk. Both the structure and the probability tables in the underlying model are built using a large dataset collected in Spain from annual work health assessments, with uncertainty characterized through posterior distributions. We illustrate its use for public health practice, policy and research purposes. A freely available version of the software is included in an Appendix. △ Less

Submitted 31 March, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

Comments: 16 pages, 7 tables, 2 figures, companion software (free for academic use) Update: Format edited

MSC Class: 62C10 ACM Class: I.2.m; J.3

arXiv:1907.13612 [pdf, other]

doi 10.1177/1550147720921309

MSNM-Sensor: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Authors: Roberto Magán-Carrión, José Camacho, Gabriel Maciá-Fernández, Ángel Ruíz-Zafra

Abstract: Technology evolves quickly. Low-cost and ready-to-connect devices are designed to provide new services and applications. Smart grids or smart healthcare systems are some examples of these applications, all of which are in the context of smart cities. In this total-connectivity scenario, some security issues arise since the larger the number of connected devices is, the greater the surface attack d… ▽ More Technology evolves quickly. Low-cost and ready-to-connect devices are designed to provide new services and applications. Smart grids or smart healthcare systems are some examples of these applications, all of which are in the context of smart cities. In this total-connectivity scenario, some security issues arise since the larger the number of connected devices is, the greater the surface attack dimension. In this way, new solutions for monitoring and detecting security events are needed to address new challenges brought about by this scenario, among others, the large number of devices to monitor, the large amount of data to manage and the real-time requirement to provide quick security event detection and, consequently, quick response to attacks. In this work, a practical and ready-to-use tool for monitoring and detecting security events in these environments is developed and introduced. The tool is based on the Multivariate Statistical Network Monitoring (MSNM) methodology for monitoring and anomaly detection and we call it MSNM-Sensor. Although it is in its early development stages, experimental results based on the detection of well-known attacks in hierarchical network systems prove the suitability of this tool for more complex scenarios, such as those found in smart cities or IoT ecosystems. △ Less

Submitted 5 December, 2021; v1 submitted 31 July, 2019; originally announced July 2019.

Journal ref: International Journal of Distributed Sensor Networks, vol. 16, no. 5, p. 1550147720921309, May 2020

arXiv:1907.03989 [pdf, ps, other]

doi 10.1016/j.chemolab.2019.103907

All Sparse PCA Models Are Wrong, But Some Are Useful. Part I: Computation of Scores, Residuals and Explained Variance

Authors: J. Camacho, A. K. Smilde, E. Saccenti, J. A. Westerhuis

Abstract: Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may… ▽ More Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA cannot directly be applied to sPCA models. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this problem. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Journal ref: Chemometrics and Intelligent Laboratory Systems, 2020, 196: 1039072-

arXiv:1907.02677 [pdf, other]

doi 10.1109/TNSM.2024.3368501

Interpretable Feature Learning in Multivariate Big Data Analysis for Network Monitoring

Authors: José Camacho, Katarzyna Wasielewska, Rasmus Bro, David Kotz

Abstract: There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed… ▽ More There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and largest Wi-Fi trace known to date. △ Less

Submitted 1 March, 2024; v1 submitted 5 July, 2019; originally announced July 2019.

Journal ref: IEEE Transactions on Network and Service Management, 2024

arXiv:1907.00032 [pdf, other]

doi 10.1016/j.chemolab.2020.104038.

Cross-product Penalized Component Analysis (XCAN)

Authors: José Camacho, Evrim Acar, Morten A. Rasmussen, Rasmus Bro

Abstract: Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Princi… ▽ More Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Principal Component Analysis (SPCA) framework based on the LASSO, (ii) extensions of SPCA to constrain both modes of the factorization, like co-clustering or the Penalized Matrix Decomposition (PMD), and (iii) the Group-wise Principal Component Analysis (GPCA) method. The result is a flexible modeling approach that can be used for data exploration in a large variety of problems. We demonstrate its use with applications from different disciplines. △ Less

Submitted 28 June, 2019; originally announced July 2019.

Journal ref: Chemometrics and Intelligent Laboratory Systems, 2020, 203: 104038-

arXiv:1906.11976 [pdf, other]

doi 10.1016/j.cose.2019.101603

Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Authors: José Camacho, José Manuel García-Giménez, Noemí Marta Fuentes-García, Gabriel Maciá-Fernández

Abstract: The research literature on cybersecurity incident detection & response is very rich in automatic detection methodologies, in particular those based on the anomaly detection paradigm. However, very little attention has been devoted to the diagnosis ability of the methods, aimed to provide useful information on the causes of a given detected anomaly. This information is of utmost importance for the… ▽ More The research literature on cybersecurity incident detection & response is very rich in automatic detection methodologies, in particular those based on the anomaly detection paradigm. However, very little attention has been devoted to the diagnosis ability of the methods, aimed to provide useful information on the causes of a given detected anomaly. This information is of utmost importance for the security team to reduce the time from detection to response. In this paper, we present Multivariate Big Data Analysis (MBDA), a complete intrusion detection approach based on 5 steps to effectively handle massive amounts of disparate data sources. The approach has been designed to deal with the main characteristics of Big Data, that is, the high volume, velocity and variety. The core of the approach is the Multivariate Statistical Network Monitoring (MSNM) technique proposed in a recent paper. Unlike in state of the art machine learning methodologies applied to the intrusion detection problem, when an anomaly is identified in MBDA the output of the system includes the detail of the logs of raw information associated to this anomaly, so that the security team can use this information to elucidate its root causes. MBDA is based in two open software packages available in Github: the MEDA Toolbox and the FCParser. We illustrate our approach with two case studies. The first one demonstrates the application of MBDA to semistructured sources of information, using the data from the VAST 2012 mini challenge 2. This complete case study is supplied in a virtual machine available for download. In the second case study we show the Big Data capabilities of the approach in data collected from a real network with labeled attacks. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Journal ref: Computers & Security, Volume 87, November 2019, 101603

arXiv:1102.2146 [pdf, ps, other]

doi 10.1103/PhysRevLett.108.038701

Coexistence of cooperators and defectors in well mixed populations mediated by limiting resources

Authors: Rubén J. Requejo, Juan Camacho

Abstract: Traditionally, resource limitation in evolutionary game theory is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma, and to allow for the stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a self-organizing process t… ▽ More Traditionally, resource limitation in evolutionary game theory is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma, and to allow for the stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a self-organizing process that turns the interaction payoff matrix into evolutionary neutral, and represents a resource-based control mechanism preventing the spread of defectors. To our knowledge, this is the first example of coexistence in well-mixed populations with a game structure different from a snowdrift game. △ Less

Submitted 25 October, 2012; v1 submitted 10 February, 2011; originally announced February 2011.

Comments: 9 pages, 7 figures

Journal ref: Phys. Rev. Lett. 108, 038701 (2012)

Showing 1–10 of 10 results for author: Camacho, J