-
Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI
Authors:
Hugo Caselles-Dupré,
Charles Mellerio,
Paul Hérent,
Alizée Lopez-Persem,
Benoit Béranger,
Mathieu Soularue,
Pierre Fautrel,
Gauthier Vernier,
Matthieu Cord
Abstract:
The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with pot…
▽ More
The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.
△ Less
Submitted 28 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Likelihood-based inference for modelling packet transit from thinned flow summaries
Authors:
Prosha A. Rahman,
Boris Beranger,
Matthew Roughan,
Scott A. Sisson
Abstract:
The substantial growth of network traffic speed and volume presents practical challenges to network data analysis. Packet thinning and flow aggregation protocols such as NetFlow reduce the size of datasets by providing structured data summaries, but conversely this impedes statistical inference. Methods which aim to model patterns of traffic propagation typically do not account for the packet thin…
▽ More
The substantial growth of network traffic speed and volume presents practical challenges to network data analysis. Packet thinning and flow aggregation protocols such as NetFlow reduce the size of datasets by providing structured data summaries, but conversely this impedes statistical inference. Methods which aim to model patterns of traffic propagation typically do not account for the packet thinning and summarisation process into the analysis, and are often simplistic, e.g.~method-of-moments. As a result, they can be of limited practical use.
We introduce a likelihood-based analysis which fully incorporates packet thinning and NetFlow summarisation into the analysis. As a result, inferences can be made for models on the level of individual packets while only observing thinned flow summary information. We establish consistency of the resulting maximum likelihood estimator, derive bounds on the volume of traffic which should be observed to achieve required levels of estimator accuracy, and identify an ideal family of models. The robust performance of the estimator is examined through simulated analyses and an application on a publicly available trace dataset containing over 36m packets over a 1 minute period.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Logistic regression models for aggregated data
Authors:
Tom Whitaker,
Boris Beranger,
Scott A. Sisson
Abstract:
Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas bas…
▽ More
Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates compared to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost. Performance is explored through simulated examples, and analyses of large supersymmetry and satellite crop classification datasets.
△ Less
Submitted 24 August, 2020; v1 submitted 8 December, 2019;
originally announced December 2019.
-
Composite likelihood methods for histogram-valued random variables
Authors:
Thomas Whitaker,
Boris Beranger,
Scott A. Sisson
Abstract:
Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the larger dataset. Recent work has developed likelihood-based methods that permit fitting models for the underlying data while only observing the distributional summar…
▽ More
Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the larger dataset. Recent work has developed likelihood-based methods that permit fitting models for the underlying data while only observing the distributional summaries. However, while powerful, when working with random histograms this approach rapidly becomes computationally intractable as the dimension of the underlying data increases. We introduce a composite-likelihood variation of this likelihood-based approach for the analysis of random histograms in $K$ dimensions, through the construction of lower-dimensional marginal histograms. The performance of this approach is examined through simulated and real data analysis of max-stable models for spatial extremes using millions of observed datapoints in more than $K=100$ dimensions. Large computational savings are available compared to existing model fitting approaches.
△ Less
Submitted 20 March, 2020; v1 submitted 30 August, 2019;
originally announced August 2019.
-
High-dimensional inference using the extremal skew-$t$ process
Authors:
B. Beranger,
A. G. Stephenson,
S. A. Sisson
Abstract:
Max-stable processes are a popular tool for the study of environmental extremes, and the extremal skew-$t$ process is a general model that allows for a flexible extremal dependence structure. For inference on max-stable processes with high-dimensional data, exact likelihood-based estimation is computationally intractable. Composite likelihoods, using lower dimensional components, and Stephenson-Ta…
▽ More
Max-stable processes are a popular tool for the study of environmental extremes, and the extremal skew-$t$ process is a general model that allows for a flexible extremal dependence structure. For inference on max-stable processes with high-dimensional data, exact likelihood-based estimation is computationally intractable. Composite likelihoods, using lower dimensional components, and Stephenson-Tawn likelihoods, using occurrence times of maxima, are both attractive methods to circumvent this issue for moderate dimensions. In this article we establish the theoretical formulae for simulations of and inference for the extremal skew-$t$ process. We also incorporate the Stephenson-Tawn concept into the composite likelihood framework, giving greater statistical and computational efficiency for higher-order composite likelihoods. We compare 2-way (pairwise), 3-way (triplewise), 4-way, 5-way and 10-way composite likelihoods for models of up to 100 dimensions. Furthermore, we propose cdf approximations for the Stephenson-Tawn likelihood function, leading to large computational gains, and enabling accurate fitting of models in large dimensions in only a few minutes. We illustrate our methodology with an application to a 90-dimensional temperature dataset from Melbourne, Australia.
△ Less
Submitted 18 April, 2020; v1 submitted 23 July, 2019;
originally announced July 2019.
-
Estimation and uncertainty quantification for extreme quantile regions
Authors:
Boris Beranger,
Simone A. Padoan,
Scott A. Sisson
Abstract:
Estimation of extreme quantile regions, spaces in which future extreme events can occur with a given low probability, even beyond the range of the observed data, is an important task in the analysis of extremes. Existing methods to estimate such regions are available, but do not provide any measures of estimation uncertainty. We develop univariate and bivariate schemes for estimating extreme quant…
▽ More
Estimation of extreme quantile regions, spaces in which future extreme events can occur with a given low probability, even beyond the range of the observed data, is an important task in the analysis of extremes. Existing methods to estimate such regions are available, but do not provide any measures of estimation uncertainty. We develop univariate and bivariate schemes for estimating extreme quantile regions under the Bayesian paradigm that outperforms existing approaches and provides natural measures of quantile region estimate uncertainty. We examine the method's performance in controlled simulation studies. We illustrate the applicability of the proposed method by analysing high bivariate quantiles for pairs of pollutants, conditionally on different temperature gradations, recorded in Milan, Italy.
△ Less
Submitted 27 October, 2020; v1 submitted 17 April, 2019;
originally announced April 2019.
-
Extremal properties of the multivariate extended skew-normal distribution
Authors:
Boris Beranger,
Simone A. Padoan,
Yangfan Xu,
Scott A. Sisson
Abstract:
The skew-normal and related families are flexible and asymmetric parametric models suitable for modelling a diverse range of systems. We show that the multivariate maximum of a high-dimensional extended skew-normal random sample has asymptotically independent components and derive the speed of convergence of the joint tail. To describe the possible dependence among the components of the multivaria…
▽ More
The skew-normal and related families are flexible and asymmetric parametric models suitable for modelling a diverse range of systems. We show that the multivariate maximum of a high-dimensional extended skew-normal random sample has asymptotically independent components and derive the speed of convergence of the joint tail. To describe the possible dependence among the components of the multivariate maximum, we show that under appropriate conditions an approximate multivariate extreme-value distribution that leads to a rich dependence structure can be derived.
△ Less
Submitted 27 September, 2018;
originally announced October 2018.
-
New models for symbolic data analysis
Authors:
Boris Beranger,
Huan Lin,
Scott A. Sisson
Abstract:
Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e. symbols), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new…
▽ More
Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e. symbols), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are aggregated into symbols - group-based distributional-valued summaries - prior to the analysis. In this way, large and complex datasets can be reduced to a smaller number of distributional summaries, that may be analysed more efficiently than the original dataset. As such, we develop SDA techniques as a new approach for the analysis of big data. In particular we introduce a new general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to develo** SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses.
△ Less
Submitted 7 April, 2020; v1 submitted 10 September, 2018;
originally announced September 2018.
-
Extremal properties of the univariate extended skew-normal distribution
Authors:
Boris Beranger,
Simone A. Padoan,
Yangfan Xu,
Scott A. Sisson
Abstract:
We consider the extremal properties of the highly flexible univariate extended skew-normal distribution. We derive the well-known Mills' inequalities and Mills' ratio for the extended skew-normal distribution and establish the asymptotic extreme-value distribution for the maximum of samples drawn from this distribution.
We consider the extremal properties of the highly flexible univariate extended skew-normal distribution. We derive the well-known Mills' inequalities and Mills' ratio for the extended skew-normal distribution and establish the asymptotic extreme-value distribution for the maximum of samples drawn from this distribution.
△ Less
Submitted 27 September, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Constructing Likelihood Functions for Interval-valued Random Variables
Authors:
Xin Zhang,
Boris Beranger,
Scott A. Sisson
Abstract:
There is a growing need for the ability to analyse interval-valued data. However, existing descriptive frameworks to achieve this ignore the process by which interval-valued data are typically constructed; namely by the aggregation of real-valued data generated from some underlying process. In this article we develop the foundations of likelihood based statistical inference for random intervals th…
▽ More
There is a growing need for the ability to analyse interval-valued data. However, existing descriptive frameworks to achieve this ignore the process by which interval-valued data are typically constructed; namely by the aggregation of real-valued data generated from some underlying process. In this article we develop the foundations of likelihood based statistical inference for random intervals that directly incorporates the underlying generative procedure into the analysis. That is, it permits the direct fitting of models for the underlying real-valued data given only the random interval-valued summaries. This generative approach overcomes several problems associated with existing methods, including the rarely satisfied assumption of within-interval uniformity. The new methods are illustrated by simulated and real data analyses.
△ Less
Submitted 6 March, 2019; v1 submitted 30 July, 2016;
originally announced August 2016.
-
Exploratory data analysis for moderate extreme values using non-parametric kernel methods
Authors:
Boris Beranger,
Tarn Duong,
Sarah E. Perkins-Kirkpatrick,
Scott A. Sisson
Abstract:
In many settings it is critical to accurately model the extreme tail behaviour of a random process. Non-parametric density estimation methods are commonly implemented as exploratory data analysis techniques for this purpose as they possess excellent visualisation properties, and can naturally avoid the model specification biases implied by using parametric estimators. In particular, kernel-based e…
▽ More
In many settings it is critical to accurately model the extreme tail behaviour of a random process. Non-parametric density estimation methods are commonly implemented as exploratory data analysis techniques for this purpose as they possess excellent visualisation properties, and can naturally avoid the model specification biases implied by using parametric estimators. In particular, kernel-based estimators place minimal assumptions on the data, and provide improved visualisation over scatterplots and histograms. However kernel density estimators are known to perform poorly when estimating extreme tail behaviour, which is important when interest is in process behaviour above some large threshold, and they can over-emphasise bumps in the density for heavy tailed data. In this article we develop a transformation kernel density estimator, and demonstrate that its mean integrated squared error (MISE) efficiency is equivalent to that of standard, non-tail focused kernel density estimators. Estimator performance is illustrated in numerical studies, and in an expanded analysis of the ability of well known global climate models to reproduce observed temperature extremes in Sydney, Australia.
△ Less
Submitted 6 December, 2017; v1 submitted 28 February, 2016;
originally announced February 2016.
-
Extreme Dependence Models
Authors:
Boris Beranger,
Simone A. Padoan
Abstract:
Extreme values of real phenomena are events that occur with low frequency, but can have a large impact on real life. These are, in many practical problems, high-dimensional by nature (e.g. Tawn, 1990; Coles and Tawn, 1991). To study these events is of fundamental importance. For this purpose, probabilistic models and statistical methods are in high demand. There are several approaches to modelling…
▽ More
Extreme values of real phenomena are events that occur with low frequency, but can have a large impact on real life. These are, in many practical problems, high-dimensional by nature (e.g. Tawn, 1990; Coles and Tawn, 1991). To study these events is of fundamental importance. For this purpose, probabilistic models and statistical methods are in high demand. There are several approaches to modelling multivariate extremes as described in Falk et al. (2011), linked to some extent. We describe an approach for deriving multivariate extreme value models and we illustrate the main features of some flexible extremal dependence models. We compare them by showing their utility with a real data application, in particular analyzing the extremal dependence among several pollutants recorded in the city of Leeds, UK.
△ Less
Submitted 22 August, 2015;
originally announced August 2015.
-
Models for extremal dependence derived from skew-symmetric families
Authors:
Boris Beranger,
Simone A. Padoan,
Scott A. Sisson
Abstract:
Skew-symmetric families of distributions such as the skew-normal and skew-$t$ represent supersets of the normal and $t$ distributions, and they exhibit richer classes of extremal behaviour. By defining a non-stationary skew-normal process, which allows the easy handling of positive definite, non-stationary covariance functions, we derive a new family of max-stable processes - the extremal-skew-…
▽ More
Skew-symmetric families of distributions such as the skew-normal and skew-$t$ represent supersets of the normal and $t$ distributions, and they exhibit richer classes of extremal behaviour. By defining a non-stationary skew-normal process, which allows the easy handling of positive definite, non-stationary covariance functions, we derive a new family of max-stable processes - the extremal-skew-$t$ process. This process is a superset of non-stationary processes that include the stationary extremal-$t$ processes. We provide the spectral representation and the resulting angular densities of the extremal-skew-$t$ process, and illustrate its practical implementation (Includes Supporting Information).
△ Less
Submitted 17 April, 2016; v1 submitted 1 July, 2015;
originally announced July 2015.