Search | arXiv e-print repository

Accelerating hyperbolic t-SNE

Authors: Martin Skrodzki, Hunter van Geffen, Nicolas F. Chaves-de-Plaza, Thomas Höllt, Elmar Eisemann, Klaus Hildebrandt

Abstract: The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embe… ▽ More The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embedding performance. However, none of the existing dimensionality reduction methods for embedding into hyperbolic spaces scale well with the size of the input data. That is because the embeddings are computed via iterative optimization schemes and the computation cost of every iteration is quadratic in the size of the input. Furthermore, due to the non-linear nature of hyperbolic spaces, Euclidean acceleration structures cannot directly be translated to the hyperbolic setting. This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree. We compare our approach with existing methods and demonstrate that it computes embeddings of similar quality in significantly less time. Implementation and scripts for the experiments can be found at https://graphics.tudelft.nl/accelerating-hyperbolic-tsne. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2308.15513 [pdf, other]

Tuning the perplexity for and computing sampling-based t-SNE embeddings

Authors: Martin Skrodzki, Nicolas Chaves-de-Plaza, Klaus Hildebrandt, Thomas Höllt, Elmar Eisemann

Abstract: Widely used pipelines for the analysis of high-dimensional data utilize two-dimensional visualizations. These are created, e.g., via t-distributed stochastic neighbor embedding (t-SNE). When it comes to large data sets, applying these visualization techniques creates suboptimal embeddings, as the hyperparameters are not suitable for large data. Cranking up these parameters usually does not work as… ▽ More Widely used pipelines for the analysis of high-dimensional data utilize two-dimensional visualizations. These are created, e.g., via t-distributed stochastic neighbor embedding (t-SNE). When it comes to large data sets, applying these visualization techniques creates suboptimal embeddings, as the hyperparameters are not suitable for large data. Cranking up these parameters usually does not work as the computations become too expensive for practical workflows. In this paper, we argue that a sampling-based embedding approach can circumvent these problems. We show that hyperparameters must be chosen carefully, depending on the sampling rate and the intended final embedding. Further, we show how this approach speeds up the computation and increases the quality of the embeddings. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.01751 [pdf, other]

doi 10.1109/TVCG.2023.3326582

ManiVault: A Flexible and Extensible Visual Analytics Framework for High-Dimensional Data

Authors: Alexander Vieth, Thomas Kroes, Julian Thijssen, Baldur van Lew, Jeroen Eggermont, Soumyadeep Basu, Elmar Eisemann, Anna Vilanova, Thomas Höllt, Boudewijn Lelieveldt

Abstract: Exploration and analysis of high-dimensional data are important tasks in many fields that produce large and complex data, like the financial sector, systems biology, or cultural heritage. Tailor-made visual analytics software is developed for each specific application, limiting their applicability in other fields. However, as diverse as these fields are, their characteristics and requirements for… ▽ More Exploration and analysis of high-dimensional data are important tasks in many fields that produce large and complex data, like the financial sector, systems biology, or cultural heritage. Tailor-made visual analytics software is developed for each specific application, limiting their applicability in other fields. However, as diverse as these fields are, their characteristics and requirements for data analysis are conceptually similar. Many applications share abstract tasks and data types and are often constructed with similar building blocks. Develo** such applications, even when based mostly on existing building blocks, requires significant engineering efforts. We developed ManiVault, a flexible and extensible open-source visual analytics framework for analyzing high-dimensional data. The primary objective of ManiVault is to facilitate rapid prototy** of visual analytics workflows for visualization software developers and practitioners alike. ManiVault is built using a plugin-based architecture that offers easy extensibility. While our architecture deliberately keeps plugins self-contained, to guarantee maximum flexibility and re-usability, we have designed and implemented a messaging API for tight integration and linking of modules to support common visual analytics design patterns. We provide several visualization and analytics plugins, and ManiVault's API makes the integration of new plugins easy for developers. ManiVault facilitates the distribution of visualization and analysis pipelines and results for practitioners through saving and reproducing complete application states. As such, ManiVault can be used as a communication tool among researchers to discuss workflows and results. A copy of this paper and all supplemental material is available at https://osf.io/9k6jw and source code at https://github.com/ManiVaultStudio. △ Less

Submitted 7 November, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 11 pages paper (incl. 2 pages references and acknowledgements), 2 pages supplement

Journal ref: IEEE Transactions on Visualization and Computer Graphics (Proceedings of IEEE VIS 2023), 30(2), 2024

arXiv:2202.09179 [pdf, other]

doi 10.1109/PacificVis53943.2022.00010

Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images

Authors: Alexander Vieth, Anna Vilanova, Boudewijn Lelieveldt, Elmar Eisemann, Thomas Höllt

Abstract: High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. However, common dimensionality reduction methods do not include spatial information present in images, such as local texture features, into the construction of low-dim… ▽ More High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. However, common dimensionality reduction methods do not include spatial information present in images, such as local texture features, into the construction of low-dimensional embeddings. Consequently, exploration of such data is typically split into a step focusing on the attribute space followed by a step focusing on spatial information, or vice versa. In this paper, we present a method for incorporating spatial neighborhood information into distance-based dimensionality reduction methods, such as t-Distributed Stochastic Neighbor Embedding (t-SNE). We achieve this by modifying the distance measure between high-dimensional attribute vectors associated with each pixel such that it takes the pixel's spatial neighborhood into account. Based on a classification of different methods for comparing image patches, we explore a number of different approaches. We compare these approaches from a theoretical and experimental point of view. Finally, we illustrate the value of the proposed methods by qualitative and quantitative evaluation on synthetic data and two real-world use cases. △ Less

Submitted 2 March, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: 10 pages main paper, 8 pages supplemental material. To appear at IEEE 15th Pacific Visualization Symposium 2022

arXiv:2007.11512 [pdf, other]

InCorr: Interactive Data-Driven Correlation Panels for Digital Outcrop Analysis

Authors: Thomas Ortner, Andreas Walch, Rebecca Nowak, Robert Barnes, Thomas Höllt, Eduard Gröller

Abstract: Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key… ▽ More Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key geological horizons, and build a stratigraphic model to understand their position in the ancient landscape. Currently, the creation of correlation panels is completely manual and therefore time-consuming, and inflexible. With InCorr we present a visualization solution that encompasses a 3D logging tool and an interactive data-driven correlation panel that evolves with the stratigraphic analysis. For the creation of InCorr we closely cooperated with leading planetary geologists in the form of a design study. We verify our results by recreating an existing correlation analysis with InCorr and validate our correlation panel against a manually created illustration. Further, we conducted a user-study with a wider circle of geologists. Our evaluation shows that InCorr efficiently supports the domain experts in tackling their research questions and that it has the potential to significantly impact how geologists work with digital outcrop representations in general. △ Less

Submitted 8 November, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

arXiv:2006.05175 [pdf, other]

doi 10.1109/TVCG.2020.3030336

Visual cohort comparison for spatial single-cell omics-data

Authors: Antonios Somarakis, Marieke E. Ijsselsteijn, Sietse J. Luk, Boyd Kenkhuis, Noel F. C. C. de Miranda, Boudewijn P. F. Lelieveldt, Thomas Höllt

Abstract: Spatially-resolved omics-data enable researchers to precisely distinguish cell types in tissue and explore their spatial interactions, enabling deep understanding of tissue functionality. To understand what causes or deteriorates a disease and identify related biomarkers, clinical researchers regularly perform large-scale cohort studies, requiring the comparison of such data at cellular level. In… ▽ More Spatially-resolved omics-data enable researchers to precisely distinguish cell types in tissue and explore their spatial interactions, enabling deep understanding of tissue functionality. To understand what causes or deteriorates a disease and identify related biomarkers, clinical researchers regularly perform large-scale cohort studies, requiring the comparison of such data at cellular level. In such studies, with little a-priori knowledge of what to expect in the data, explorative data analysis is a necessity. Here, we present an interactive visual analysis workflow for the comparison of cohorts of spatially-resolved omics-data. Our workflow allows the comparative analysis of two cohorts based on multiple levels-of-detail, from simple abundance of contained cell types over complex co-localization patterns to individual comparison of complete tissue images. As a result, the workflow enables the identification of cohort-differentiating features, as well as outlier samples at any stage of the workflow. During the development of the workflow, we continuously consulted with domain experts. To show the effectiveness of the workflow we conducted multiple case studies with domain experts from different application areas and with different data modalities. △ Less

Submitted 30 July, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: 11 pages, 10 figures, 2 tables. Revised based on IEEE VIS 2020 reviewers comments. ACM 2012 CCS - Human-centered computing, Visualization, Visualization application domains, Visual analytics. Binary of the presented tool is available is our repository: https://doi.org/10.5281/zenodo.3885814

ACM Class: H.5.0

Journal ref: Presented in IEEE Vis 2020. Published in IEEE Transactions on Visualization and Computer Graphics (TVCG)

arXiv:1805.10817 [pdf, other]

GPGPU Linear Complexity t-SNE Optimization

Authors: Nicola Pezzotti, Julian Thijssen, Alexander Mordvintsev, Thomas Hollt, Baldur van Lew, Boudewijn P. F. Lelieveldt, Elmar Eisemann, Anna Vilanova

Abstract: The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorith… ▽ More The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of tSNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the tSNE embedding for large datasets. In this work, we present a novel approach to the minimization of the tSNE objective function that heavily relies on modern graphics hardware and has linear computational complexity. Our technique does not only beat the state of the art, but can even be executed on the client side in a browser. We propose to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL. This approximation allows us to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations. △ Less

Submitted 8 August, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

arXiv:1512.01655 [pdf, ps, other]

Approximated and User Steerable tSNE for Progressive Visual Analytics

Authors: Nicola Pezzotti, Boudewijn P. F. Lelieveldt, Laurens van der Maaten, Thomas Höllt, Elmar Eisemann, Anna Vilanova

Abstract: Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique… ▽ More Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis. △ Less

Submitted 16 June, 2016; v1 submitted 5 December, 2015; originally announced December 2015.

Showing 1–8 of 8 results for author: Höllt, T