-
A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data
Authors:
H. Sherry Zhang,
Dianne Cook,
Ursula Laa,
Nicolas Langrené,
Patricia Menéndez
Abstract:
Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation…
▽ More
Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swap** them to experiment with various index designs, calculate uncertainty measures, and assess indexes robustness. The paper presents three examples to illustrate the pipeline framework usage: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex.
△ Less
Submitted 13 May, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Frame to frame interpolation for high-dimensional data visualisation using the woylier package
Authors:
Zoljargal Batsaikhan,
Dianne Cook,
Ursula Laa
Abstract:
The woylier package implements tour interpolation paths between frames using Givens rotations. This provides an alternative to the geodesic interpolation between planes currently available in the tourr package. Tours are used to visualise high-dimensional data and models, to detect clustering, anomalies and non-linear relationships. Frame-to-frame interpolation can be useful for projection pursuit…
▽ More
The woylier package implements tour interpolation paths between frames using Givens rotations. This provides an alternative to the geodesic interpolation between planes currently available in the tourr package. Tours are used to visualise high-dimensional data and models, to detect clustering, anomalies and non-linear relationships. Frame-to-frame interpolation can be useful for projection pursuit guided tours when the index is not rotationally invariant. It also provides a way to specifically reach a given target frame. We demonstrate the method for exploring non-linear relationships between currency cross-rates.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
New and simplified manual controls for projection and slice tours, with application to exploring classification boundaries in high dimensions
Authors:
Ursula Laa,
Alex Aumann,
Dianne Cook,
German Valencia
Abstract:
This paper describes new user controls for examining high-dimensional data using low-dimensional linear projections and slices. A user can interactively change the contribution of a given variable to a low-dimensional projection, which is useful for exploring the sensitivity of structure to particular variables. The user can also interactively shift the center of a slice, for example, to explore h…
▽ More
This paper describes new user controls for examining high-dimensional data using low-dimensional linear projections and slices. A user can interactively change the contribution of a given variable to a low-dimensional projection, which is useful for exploring the sensitivity of structure to particular variables. The user can also interactively shift the center of a slice, for example, to explore how structure changes in local subspaces. The Mathematica package as well as example notebooks are provided, which contain functions enabling the user to experiment with these new manual controls, with one specifically for exploring regions and boundaries produced by classification models. The advantage of Mathematica is its linear algebra capabilities, and interactive cursor location controls. Some limited implementation has also been made available in the R package tourr.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data
Authors:
H. Sherry Zhang,
Dianne Cook,
Ursula Laa,
Nicolas Langrené,
Patricia Menéndez
Abstract:
Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyse different aspects…
▽ More
Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyse different aspects of the spatio-temporal data simultaneouly, like for instance, temporal trends of multiple variables across locations. In order to facilitate the study of different portions or combinations of spatio-temporal data, we introduce a new data structure, cubble, with a suite of functions enabling easy slicing and dicing on the different components spatio-temporal components. The proposed cubble structure ensures that all the components of the data are easy to access and manipulate while providing flexibility for data analysis. In addition, cubble facilitates visual and numerical explorations of the data while easing data wrangling and modelling. The cubble structure and the functions provided in the cubble R package equip users with the capability to handle hierarchical spatial and temporal structures. The cubble structure and the tools implemented in the package are illustrated with different examples of Australian climate data.
△ Less
Submitted 10 January, 2024; v1 submitted 30 April, 2022;
originally announced May 2022.
-
A Review of the State-of-the-Art on Tours for Dynamic Visualization of High-dimensional Data
Authors:
Stuart Lee,
Dianne Cook,
Natalia da Silva,
Ursula Laa,
Earo Wang,
Nick Spyrison,
H. Sherry Zhang
Abstract:
This article discusses a high-dimensional visualization technique called the tour, which can be used to view data in more than three dimensions. We review the theory and history behind the technique, as well as modern software developments and applications of the tour that are being found across the sciences and machine learning.
This article discusses a high-dimensional visualization technique called the tour, which can be used to view data in more than three dimensions. We review the theory and history behind the technique, as well as modern software developments and applications of the tour that are being found across the sciences and machine learning.
△ Less
Submitted 19 April, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Visual Diagnostics for Constrained Optimisation with Application to Guided Tours
Authors:
H. Sherry Zhang,
Dianne Cook,
Ursula Laa,
Nicolas Langrené,
Patricia Menéndez
Abstract:
A guided tour helps to visualise high-dimensional data by showing low-dimensional projections along a projection pursuit optimisation path. Projection pursuit is a generalisation of principal component analysis, in the sense that different indexes are used to define the interestingness of the projected data. While much work has been done in develo** new indexes in the literature, less has been d…
▽ More
A guided tour helps to visualise high-dimensional data by showing low-dimensional projections along a projection pursuit optimisation path. Projection pursuit is a generalisation of principal component analysis, in the sense that different indexes are used to define the interestingness of the projected data. While much work has been done in develo** new indexes in the literature, less has been done on understanding the optimisation. Index functions can be noisy, might have multiple local maxima as well as an optimal maximum, and are constrained to generate orthonormal projection frames, which complicates the optimization. In addition, projection pursuit is primarily used for exploratory data analysis, and finding the local maxima is also useful. The guided tour is especially useful for exploration, because it conducts geodesic interpolation connecting steps in the optimisation and shows how the projected data changes as a maxima is approached. This work provides new visual diagnostics for examining a choice of optimisation procedure, based on the provision of a new data object which collects information throughout the optimisation. It has helped to diagnose and fix several problems with projection pursuit guided tour. This work might be useful more broadly for diagnosing optimisers, and comparing their performance. The diagnostics are implemented in the R package, ferrn.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Pandemonium: a clustering tool to partition parameter space -- application to the B anomalies
Authors:
Ursula Laa,
German Valencia
Abstract:
We introduce the interactive tool pandemonium to cluster model predictions that depend on a set of parameters. The model predictions are used to define the coordinates in observable space which go into the clustering. The results of this partitioning are then visualized in both observable and parameter space to study correlations between them. The tool offers multiple choices for coordinates, dist…
▽ More
We introduce the interactive tool pandemonium to cluster model predictions that depend on a set of parameters. The model predictions are used to define the coordinates in observable space which go into the clustering. The results of this partitioning are then visualized in both observable and parameter space to study correlations between them. The tool offers multiple choices for coordinates, distance functions and linkage methods within hierarchical clustering. It provides a set of diagnostic statistics and visualization methods to study the clustering results in order to interpret the outcome. The methods are most useful in an interactive environment that enables exploration, and we have implemented them with a graphical user interface in R. We demonstrate the concepts with an application to phenomenological studies in flavor physics in the context of the so-called B anomalies.
△ Less
Submitted 20 December, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
Casting Multiple Shadows: High-Dimensional Interactive Data Visualisation with Tours and Embeddings
Authors:
Stuart Lee,
Ursula Laa,
Dianne Cook
Abstract:
Non-linear dimensionality reduction (NLDR) methods such as t-distributed stochastic neighbour embedding (t-SNE) are ubiquitous in the natural sciences, however, the appropriate use of these methods is difficult because of their complex parameterisations; analysts must make trade-offs in order to identify structure in the visualisation of an NLDR technique. We present visual diagnostics for the pra…
▽ More
Non-linear dimensionality reduction (NLDR) methods such as t-distributed stochastic neighbour embedding (t-SNE) are ubiquitous in the natural sciences, however, the appropriate use of these methods is difficult because of their complex parameterisations; analysts must make trade-offs in order to identify structure in the visualisation of an NLDR technique. We present visual diagnostics for the pragmatic usage of NLDR methods by combining them with a technique called the tour. A tour is a sequence of interpolated linear projections of multivariate data onto a lower dimensional space. The sequence is displayed as a dynamic visualisation, allowing a user to see the shadows the high-dimensional data casts in a lower dimensional view. By linking the tour to an NLDR view, we can preserve global structure and through user interactions like linked brushing observe where the NLDR view may be misleading. We display several case studies from both simulations and single cell transcriptomics, that shows our approach is useful for cluster orientation tasks.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Burning sage: Reversing the curse of dimensionality in the visualization of high-dimensional data
Authors:
Ursula Laa,
Dianne Cook,
Stuart Lee
Abstract:
In high-dimensional data analysis the curse of dimensionality reasons that points tend to be far away from the center of the distribution and on the edge of high-dimensional space. Contrary to this, is that projected data tends to clump at the center. This gives a sense that any structure near the center of the projection is obscured, whether this is true or not. A transformation to reverse the cu…
▽ More
In high-dimensional data analysis the curse of dimensionality reasons that points tend to be far away from the center of the distribution and on the edge of high-dimensional space. Contrary to this, is that projected data tends to clump at the center. This gives a sense that any structure near the center of the projection is obscured, whether this is true or not. A transformation to reverse the curse, is defined in this paper, which uses radial transformations on the projected data. It is integrated seamlessly into the grand tour algorithm, and we have called it a burning sage tour, to indicate that it reverses the curse. The work is implemented into the tourr package in R. Several case studies are included that show how the sage visualizations enhance exploratory clustering and classification problems.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions
Authors:
Ursula Laa,
Dianne Cook,
Andreas Buja,
German Valencia
Abstract:
Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the…
▽ More
Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package.
△ Less
Submitted 10 March, 2022; v1 submitted 28 April, 2020;
originally announced April 2020.
-
A slice tour for finding hollowness in high-dimensional data
Authors:
Ursula Laa,
Dianne Cook,
German Valencia
Abstract:
Taking projections of high-dimensional data is a common analytical and visualisation technique in statistics for working with high-dimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualising data with concavities, or non-linear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots…
▽ More
Taking projections of high-dimensional data is a common analytical and visualisation technique in statistics for working with high-dimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualising data with concavities, or non-linear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots in interactive data visualisation. This short technical note describes a simple approach for slicing in the orthogonal space of projections obtained when running a tour, thus presenting the viewer with an interpolated sequence of sliced projections. The method has been implemented in R as an extension to the tourr package, and can be used to explore for concave and non-linear structures in multivariate distributions.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics
Authors:
Ursula Laa,
Dianne Cook
Abstract:
Projection pursuit is used to find interesting low-dimensional projections of high-dimensional data by optimizing an index over all possible projections. Most indexes have been developed to detect departure from known distributions, such as normality, or to find separations between known groups. Here, we are interested in finding projections revealing potentially complex bivariate patterns, using…
▽ More
Projection pursuit is used to find interesting low-dimensional projections of high-dimensional data by optimizing an index over all possible projections. Most indexes have been developed to detect departure from known distributions, such as normality, or to find separations between known groups. Here, we are interested in finding projections revealing potentially complex bivariate patterns, using new indexes constructed from scagnostics and a maximum information coefficient, with a purpose to detect unusual relationships between model parameters describing physics phenomena. The performance of these indexes is examined with respect to ideal behaviour, using simulated data, and then applied to problems from gravitational wave astronomy. The implementation builds upon the projection pursuit tools available in the R package, tourr, with indexes constructed from code in the R packages, scagnostics, minerva and mbgraphic.
△ Less
Submitted 13 January, 2020; v1 submitted 31 January, 2019;
originally announced February 2019.