-
Neuronal functional connectivity graph estimation with the R package neurofuncon
Authors:
Lauren Miako Beede,
Giuseppe Vinci
Abstract:
Researchers continue exploring neurons' intricate patterns of activity in the cerebral visual cortex in response to visual stimuli. The way neurons communicate and optimize their interactions with each other under different experimental conditions remains a topic of active investigation. Probabilistic Graphical Models are invaluable tools in neuroscience research, as they let us identify the funct…
▽ More
Researchers continue exploring neurons' intricate patterns of activity in the cerebral visual cortex in response to visual stimuli. The way neurons communicate and optimize their interactions with each other under different experimental conditions remains a topic of active investigation. Probabilistic Graphical Models are invaluable tools in neuroscience research, as they let us identify the functional connections, or conditional statistical dependencies, between neurons. Graphical models represent these connections as a graph, where nodes represent neurons and edges indicate the presence of functional connections between them. We developed the R package neurofuncon for the computation and visualization of functional connectivity graphs from large-scale data based on the Graphical lasso. We illustrate the use of this package with publicly available two-photon calcium microscopy imaging data from approximately 10000 neurons in a 1mm cubic section of a mouse visual cortex.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Covariance matrix completion via auxiliary information
Authors:
Joseph Steneman,
Giuseppe Vinci
Abstract:
Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this proble…
▽ More
Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal correlation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches via simulations, and apply it to the analysis of large-scale neuroscience data.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Linked factor analysis
Authors:
Giuseppe Vinci
Abstract:
Factor models are widely used in the analysis of high-dimensional data in several fields of research. Estimating a factor model, in particular its covariance matrix, from partially observed data vectors is very challenging. In this work, we show that when the data are structurally incomplete, the factor model likelihood function can be decomposed into the product of the likelihood functions of mul…
▽ More
Factor models are widely used in the analysis of high-dimensional data in several fields of research. Estimating a factor model, in particular its covariance matrix, from partially observed data vectors is very challenging. In this work, we show that when the data are structurally incomplete, the factor model likelihood function can be decomposed into the product of the likelihood functions of multiple partial factor models relative to different subsets of data. If these multiple partial factor models are linked together by common parameters, then we can obtain complete maximum likelihood estimates of the factor model parameters and thereby the full covariance matrix. We call this framework Linked Factor Analysis (LINFA). LINFA can be used for covariance matrix completion, dimension reduction, data completion, and graphical dependence structure recovery. We propose an efficient Expectation-Maximization algorithm for maximum likelihood estimation, accelerated by a novel group vertex tessellation (GVT) algorithm which identifies a minimal partition of the vertex set to implement an efficient optimization in the maximization steps. We illustrate our approach in an extensive simulation study and in the analysis of calcium imaging data collected from mouse visual cortex.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Forensic Science and How Statistics Can Help It: Evidence, Hypothesis Testing, and Graphical Models
Authors:
Xiangyu Xu,
Giuseppe Vinci
Abstract:
The persistent issue of wrongful convictions in the United States emphasizes the need for scrutiny and improvement of the criminal justice system. While statistical methods for the evaluation of forensic evidence, including glass, fingerprints, and DNA, have significantly contributed to solving intricate crimes, there is a notable lack of national-level standards to ensure the appropriate applicat…
▽ More
The persistent issue of wrongful convictions in the United States emphasizes the need for scrutiny and improvement of the criminal justice system. While statistical methods for the evaluation of forensic evidence, including glass, fingerprints, and DNA, have significantly contributed to solving intricate crimes, there is a notable lack of national-level standards to ensure the appropriate application of statistics in forensic investigations. We discuss the obstacles in the application of statistics in court, and emphasize the importance of making statistical interpretation accessible to non-statisticians, especially those who make decisions about potentially innocent individuals. We investigate the use and misuse of statistical methods in crime investigations, in particular the likelihood ratio approach. We further describe the use of graphical models, where hypotheses and evidence can be represented as nodes connected by arrows signifying association or causality. We emphasize the advantages of special graph structures, such as object-oriented Bayesian networks and chain event graphs, which allow for the concurrent examination of evidence of various nature.
△ Less
Submitted 22 March, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Spiral-Elliptical automated galaxy morphology classification from telescope images
Authors:
Matthew J. Baumstark,
Giuseppe Vinci
Abstract:
The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crow…
▽ More
The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crowdsourced visual classification, and automated and computational methods, such as machine learning methods based on designed morphology statistics and deep learning. In this work, we develop two novel galaxy morphology statistics, descent average and descent variance, which can be efficiently extracted from telescope galaxy images. We further propose simplified versions of the existing image statistics concentration, asymmetry, and clumpiness, which have been widely used in the literature of galaxy morphologies. We utilize the galaxy image data from the Sloan Digital Sky Survey to demonstrate the effective performance of our proposed image statistics at accurately detecting spiral and elliptical galaxies when used as features of a random forest classifier.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
A Sequential Scheme for Large Scale Bayesian Multiple Testing
Authors:
Bin Liu,
Giuseppe Vinci,
Adam C. Snyder,
Robert E. Kass
Abstract:
The problem of large scale multiple testing arises in many contexts, including testing for pairwise interaction among large numbers of neurons. With advances in technologies, it has become common to record from hundreds of neurons simultaneously, and this number is growing quickly, so that the number of pairwise tests can be very large. It is important to control the rate at which false positives…
▽ More
The problem of large scale multiple testing arises in many contexts, including testing for pairwise interaction among large numbers of neurons. With advances in technologies, it has become common to record from hundreds of neurons simultaneously, and this number is growing quickly, so that the number of pairwise tests can be very large. It is important to control the rate at which false positives occur. In addition, there is sometimes information that affects the probability of a positive result for any given pair. In the case of neurons, they are more likely to have correlated activity when they are close together, and when they respond similarly to various stimuli. Recently a method was developed to control false positives when covariate information, such as distances between pairs of neurons, is available. This method, however, relies on computationally-intensive Markov Chain Monte Carlo (MCMC). Here we develop an alternative, based on Sequential Monte Carlo, which scales well with the size of the dataset. This scheme considers data items sequentially, with relevant probabilities being updated at each step. Simulation experiments demonstrate that the proposed algorithm delivers results as accurately as the previous MCMC method with only a single pass through the data. We illustrate the method by using it to analyze neural recordings from extrastriate cortex in a macaque monkey.
△ Less
Submitted 31 October, 2017; v1 submitted 17 February, 2017;
originally announced February 2017.
-
Estimating the distribution of Galaxy Morphologies on a continuous space
Authors:
Giuseppe Vinci,
Peter Freeman,
Jeffrey Newman,
Larry Wasserman,
Christopher Genovese
Abstract:
The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution esti…
▽ More
The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.
△ Less
Submitted 29 June, 2014;
originally announced June 2014.