HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: datetime
  • failed: contour
  • failed: epic

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-ND 4.0
arXiv:2403.05296v1 [cs.GR] 08 Mar 2024
\vgtcinsertpkg

Cyclic Polygon Plots

Maksim Schreck e-mail: [email protected]    Peter Albers e-mail: [email protected]    Filip Sadlo e-mail: [email protected] Heidelberg University, Germany
Abstract

In this paper, we introduce the cyclic polygon plot, a representation based on a novel projection concept for multi-dimensional values. Cyclic polygon plots combine the typically competing requirements of quantitativeness, image-space efficiency, and readability. Our approach is complemented with a placement strategy based on its intrinsic features, resulting in a dimensionality reduction strategy that is consistent with our overall concept. As a result, our approach combines advantages from dimensionality reduction techniques and quantitative plots, supporting a wide range of tasks in multi-dimensional data analysis. We examine and discuss the overall properties of our approach, and demonstrate its utility with a user study and selected examples.

keywords:
Visualization techniques, information visualization
\teaser\subfloat

[][Uncaptioned image] \subfloat[][Uncaptioned image] \subfloat[][Uncaptioned image] \subfloat[][Uncaptioned image] Synthetic dataset, featuring a pulse (blue), step-down (red), linearly ascending (orange), linearly descending (cyan), and double peak (green) sequence. Parallel coordinates plot \subreffig:teaser-pcp and radar chart \subreffig:teaser-rc, for comparison with our cyclic polygon plot with \abbc \subreffig:teaser-cpp-abbc and \abcd \subreffig:teaser-cpp-abcd cyclic pair selection. Notice “start arrows” and darker color indicating multiple instances in \subreffig:teaser-cpp-abbc and \subreffig:teaser-cpp-abcd. \contourlength1pt \contournumber20

Introduction

There is hardly a real-world question that could be answered by considering a single quantity. In fact, many considerations require mutual analysis of a large number of attributes, necessitating effective means for multi-dimensional data analysis.

A wide field of visualization techniques has been proposed for analyzing such multi-dimensional data, and ultimately, all of them need to perform some kind of projection from the multi-dimensional data domain to 2D image space. For example, projections employed in dimensionality reduction techniques typically involve continuous distortion that, on the one hand, aims to reduce clutter by decoupling visual density from data dimension, and on the other hand, tries to preserve original properties, such as distance metrics. Although very successful in various fields, the involved projection and distortion of the data cause loss of information and loss of quantitativeness, i.e., the original data cannot be determined from dimensionality reduction results. Other techniques, such as the parallel coordinates plot (PCP) and radar chart (RC), avoid such losses by employing discrete projections, which project the axes of the multi-dimensional data domain to 2D image space, and use these projected axes to represent the data. However, despite being quantitative and information-preserving, these techniques do typically not scale well with higher data dimensions, due to inferior image-space utilization and involved readability issues.

Overall, combining the typically competing requirements of quantitativeness, image-space efficiency, and readability is a main challenge in multi-dimensional data visualization. With the cyclic polygon plot, we present an approach that combines these requirements. It shares quantitative readability of PCPs and RCs, and the comparably high image-space efficiency of scatterplots (SP). We achieve this by splitting the data domain into two-dimensional subspaces and projecting these subspaces to image space with superposition. Thus, a multi-dimensional value represents a point in each of these 2D subspaces, and accordingly a set of points in their superposition in image space, which we connect to a polygon to preserve the correspondence to the original dimensions. In that regard, cyclic polygon plots represent a generalization of scatterplots to multi-dimensional data, generalizing points to polygons. Due to symmetry considerations, we choose the subspaces from the data domain in a cyclic manner, motivating the name of the resulting approach. Two variants of this subspace choice proved useful, exhibiting slightly different advantages, and denoted \abbc and \abcd scheme. We demonstrate that our polygons also serve well as glyphs, in particular when placed according to their intrinsic properties, providing a consistent dimensionality reduction approach.

The contributions of this work include:

  • cyclic polygon plots,

  • placement of cyclic polygon plots, and

  • their detailed discussion and evaluation including a user study.

1 Related Work

Spatialization (positioning of values in a parameterized space) of data-vector values for creating polygons is widely discussed, most prominently in the context of star glyphs [17] and RCs [12]. Whereas the glyph placement is often not intrinsically defined (as is the case with star glyphs), and often performed with a grid layout or with first and second data-vector attributes as spatialization dimensions [49], the placement of glyphs is also discussed in a geographical context [38]. Fuchs et al. [20] discuss the viability of different glyph designs while leveraging the small multiples principle, reinforced by advantageous glyph placement. In our method, we provide an intrinsic map** of our glyphs to spatial position, and discuss additional, geometrically motivated placement strategies. Radial axes layouts, as present in, for example, star glyphs, RCs, RadViz [23], or star coordinates [32], feature a compact and intuitive way to represent data [7, 45]. However, they tend to aggravate analysis, since they are harder and less efficient to interpret [47, 21]. With our approach, we provide a line- and polygon-based visualization, which does not rely on radial axes but is embedded in a 2D space, which is more familiar to interpret [8].

The PCP [25] is a widely used and expanded multivariate visualization technique. Its extension to 3D has been discussed in various contexts [15, 43] and configurations [29, 50]. Different axis layouts exist, most notably the use of a common attribute across all dimensions [15] and a bipartite layout of axes [30]. Often, an extension to 3D is employed to reduce cluttering in high-density areas [2], but almost always signifies the need for user interactivity to benefit from the 3D visualization layout. When interpreting our cyclic polygon approach as a projection of a 3D-PCP (see below), we use a different and more intuitive axis layout which will be discussed in more detail in Section 2. Fanea et al. [16] present a different method to extend the PCP to 3D by integrating it with star glyphs. Similar to our approach, this approach also supports a frontal and lateral projection, which in this case results in a star glyph and PCP representation, respectively. Zhou et al. [52] introduce an indexed point representation of generalized p𝑝pitalic_p-dimensional flat surfaces from n𝑛nitalic_n-variate data. The resulting indexed points are represented in a lower-variate PCP. Analogous to our cyclic polygons, this approach also performs a map** of n𝑛nitalic_n-variate data to, in this case, 3D subspaces. Claessen et al. [9] provide a framework for interactive design of a representation consisting of multiple, arbitrarily placed coordinate axes with PCPs or scatterplots displayed between them. While the interactive design promotes data exploration, it also requires domain knowledge to fully leverage its flexibility. With our approach, we try to limit this prior knowledge and provide a self-sufficient data representation. Blaas et al. [5] implement a GPU-based processing pipeline to effectively display large datasets using PCPs. One of the main pipeline tasks identified by them is normalization.

Nam and Mueller [36] use a trip metaphor to introduce an iterative and interactive visualization approach. This results in an overview map consisting of glyphs where the user can control parameters to navigate and enlarge the visualization. Contrary to our approach, one glyph represents a single subspace of the data in contrast to all subspaces of the data, as it is done in our approach. The subspace voyager [48] is an extension to the previous approach. Here, explicit map** to 3D subspaces is combined with an integrated navigation interface which aims to improve usability concerning manual exploration.

The representations of scatterplot matrix (SPLOM) [10] and generalized plot matrix [24] can also be understood as 2D subspace map**s of n𝑛nitalic_n-variate input data, whereas the parallel scatterplot matrix [46] additionally provides a detail view of selected dimension pairs. Nevertheless, none of these approaches combine the subspaces into a single common 2D space.

2 Method

There are two variants that our approach naturally leads to, the \abbc and the \abcd scheme. We first motivate the overall approach (Section 2.1), followed by a description of the \abbc scheme (Sections 2.22.3). Subsequently, we describe the minor modification that gives rise to the \abcd scheme (Section 2.4), and provide a detailed discussion of the properties of both schemes (Section 2.5). Finally, as a complementing approach, we investigate the suitability of our polygons as glyphs in placement strategies derived from the polygons themselves and their properties (Section 2.6).

2.1 Motivation

Our design is motivated by the aim of combining the multi-dimensional quantitativeness of, e.g., parallel coordinates plots with the image-space efficiency of scatterplots and dimensionality reduction techniques, while maintaining readability. We realize these requirements in a novel approach that utilizes subspace map** to represent n𝑛nitalic_nD data in a single 2D image space, while maintaining correspondence to the data dimensions by representing them as a polygon. Alternatively, it can be interpreted as a generalization of scatterplots from 2-dimensional (bivariate) to n𝑛nitalic_n-dimensional (multivariate) data, while kee** the representation two-dimensional. Overall, we ultimately need to map the n𝑛nitalic_n-dimensional data domain to two-dimensional image space in a quantitative manner.

2.2 Cyclic Pair Selection

We start with decomposing the n𝑛nitalic_n-dimensional data domain into a sequence of k𝑘kitalic_k two-dimensional subspaces. That is, each n𝑛nitalic_n-dimensional value

𝐝(δ0,,δn1)n𝐝subscript𝛿0subscript𝛿𝑛1superscript𝑛\mathbf{d}\coloneqq(\delta_{0},\dots,\delta_{n-1})\in\mathbb{R}^{n}bold_d ≔ ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (1)

is transformed into a sequence of k𝑘kitalic_k 2D vertex (subspace) coordinates

𝐯j(xj,yj)2,0jk1.formulae-sequencesubscript𝐯𝑗subscript𝑥𝑗subscript𝑦𝑗superscript20𝑗𝑘1\mathbf{v}_{j}\coloneqq\left(x_{j},y_{j}\right)\in\mathbb{R}^{2}\,,\quad 0\leq j% \leq k-1\,.bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≔ ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , 0 ≤ italic_j ≤ italic_k - 1 . (2)

We achieve this by what we call cyclic pair selection. We denote the most fundamental variant of such selection the \abbc scheme:

𝐝=(δ0,,δn1)(δj,δj+1(modn))j=0,,n1.𝐝subscript𝛿0subscript𝛿𝑛1maps-tosubscriptsubscript𝛿𝑗subscript𝛿𝑗1mod𝑛𝑗0𝑛1\mathbf{d}=(\delta_{0},\dots,\delta_{n-1})\mapsto(\delta_{j},\delta_{j+1\ (% \mathrm{mod}\ n)})_{j=0,\dots,n-1}\,.bold_d = ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↦ ( italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_j + 1 ( roman_mod italic_n ) end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j = 0 , … , italic_n - 1 end_POSTSUBSCRIPT . (3)

Here, we iterate through the n𝑛nitalic_n-dimensional value and sequentially pick adjacent pairs of components as one vertex coordinate. In other words,

𝐝=(δ0,,δn1)(δ0,δ1),(δ1,δ2),,(δn1,δ0).formulae-sequence𝐝subscript𝛿0subscript𝛿𝑛1maps-tosubscript𝛿0subscript𝛿1subscript𝛿1subscript𝛿2subscript𝛿𝑛1subscript𝛿0\mathbf{d}=(\delta_{0},\dots,\delta_{n-1})\mapsto(\delta_{0},\delta_{1}),(% \delta_{1},\delta_{2}),\dots,(\delta_{n-1},\delta_{0})\,.bold_d = ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↦ ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , ( italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (4)

We regard this scheme fundamental, because, due to its sequential overlap and cyclic closure, it does not cause unnecessary loss of generality, i.e., it is invariant to cyclic permutation of 𝐝𝐝\mathbf{d}bold_d. That is,

𝐝~(δl(modn),,δn1+l(modn))~𝐝subscript𝛿𝑙mod𝑛subscript𝛿𝑛1𝑙mod𝑛\tilde{\mathbf{d}}\coloneqq(\delta_{l\ (\mathrm{mod}\ n)},\dots,\delta_{n-1+l% \ (\mathrm{mod}\ n)})over~ start_ARG bold_d end_ARG ≔ ( italic_δ start_POSTSUBSCRIPT italic_l ( roman_mod italic_n ) end_POSTSUBSCRIPT , … , italic_δ start_POSTSUBSCRIPT italic_n - 1 + italic_l ( roman_mod italic_n ) end_POSTSUBSCRIPT ) (5)

produces the same sequence of 2D vertices, simply shifted by l𝑙litalic_l. The reason for this invariance is that the \abbc selection scheme is order-preserving and maps each δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT equally to the x𝑥xitalic_x- and y𝑦yitalic_y-coordinate of the 2D vertices, i.e., it does not induce bias. For the \abbc scheme, k=n𝑘𝑛k=nitalic_k = italic_n, i.e., it decomposes the n𝑛nitalic_n-dimensional data domain into a sequence of n𝑛nitalic_n two-dimensional subspaces.

To not exceed the scope of our work, we assume the order of attributes in the n𝑛nitalic_n-dimensional value to be invariant. However, optimization of this ordering is possible and has been widely discussed in the context of PCPs [35, 34] and generally in multi-dimensional visualization [39, 4, 22, 31, 11]. For space reasons, we include noteworthy details and relations about these permutations and resulting geometric variation in the supplemental material.

2.3 Map**

The obtained 2D subspaces satisfy our requirement of being quantitative. Therefore, the second step for our transformation from the n𝑛nitalic_nD data domain to 2D image space is to map and integrate these 2D subspaces to a single image space.

Here comes image-space utilization into play. One design strategy could be to place the 2D subspaces in matrix arrangement in image space, which would directly lead to the superdiagonal of the scatterplot matrix. Scatterplot matrices, however, tend to waste image space with redundant display of subspaces including their axes. In addition, these issues also affect readability. Parallel coordinates plots, as well as radar charts, share some of these shortcomings regarding waste of image space and readability (see also Section 3.3). Beyond that, their axes and ticks tend to clutter with the polyline content. Furthermore, due to their point–line duality with scatterplots, they tend to suffer from additional clutter, because points in the original 2D subspaces are mapped to entire line segments. Due to our cyclic pair selection, however, our technique resides in the point domain of the point–line duality (as discussed below), and thus tends to reduce such clutter.

Overall, these observations lead to the following requirements:

  • avoid side-by-side placement of the 2D subspaces,

  • avoid visual representation of more than two axes, and

  • keep the axes and their ticks outside of the content area.

\begin{overpic}[height=166.94131pt]{new/method/creation/v4/combined} \put(58.0,65.5){\rput[c](0.0pt,0.0pt){\color[rgb]{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill% {0} {({i})}}}\put(49.5,73.0){\rput[c](0.0pt,0.0pt){\color[rgb]{0,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0} {$(\delta_{0},\delta_{1})$}}}\put(68.0,81.0){\rput[% c](0.0pt,0.0pt){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0} {$(\delta_{1},% \delta_{2})$}}}\put(82.0,21.0){\rput[c](0.0pt,0.0pt){\color[rgb]{0,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0} {$(\delta_{2},\delta_{3})$}}}\put(28.0,14.0){\rput[% c](0.0pt,0.0pt){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0} {$(\delta_{3},% \delta_{4})$}}}\put(18.0,49.0){\rput[c](0.0pt,0.0pt){\color[rgb]{0,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0} {$(\delta_{4},\delta_{5})$}}}\put(46.5,46.0){\rput[% c](0.0pt,0.0pt){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0} {$(\delta_{5},% \delta_{0})$}}}\end{overpic}
(a)
Refer to caption
(b)
Figure 1: LABEL:sub@fig:polygon-creation-illustration-combined Cyclic polygon plot (CPP) with \abbc (orange) and \abcd (cyan) scheme, including data components δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. LABEL:sub@fig:3d-pcp CPP (orange) as projection of modified 3D parallel coordinates plot [30] (red).

These requirements motivate us to superimpose all 2D subspaces using an identity transformation, i.e., to share the same origin, abscissa, and ordinate. This merged representation is mapped to image space for display, generally using linear, or, if beneficial, logarithmic scaling (Section 3.1.4). As a consequence, all vertices 𝐯jsubscript𝐯𝑗\mathbf{v}_{j}bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are mapped to dots in image space, together with a depiction of the abscissa and the ordinate. This transformation, being an identity transformation, does not introduce any distortion to the representation of the displayed data. We remove the individual axes of the subspaces in favor of a single axis pair, a sacrifice necessary to achieve a screen-space efficient layout. Due to our subspace generation, this does not pose a threat to the quantitativeness of our representation. It does, however, require a final step to convey the correspondence between the dots in image space and the components δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of the multi-dimensional value 𝐝𝐝\mathbf{d}bold_d. We achieve this by connecting cyclically adjacent vertices of the sequence, i.e., we draw edges 𝐯j𝐯m¯¯subscript𝐯𝑗subscript𝐯𝑚\smash{\overline{\mathbf{v}_{j}\mathbf{v}_{m}}}over¯ start_ARG bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG, with m=j+1(modk)𝑚𝑗1mod𝑘m=j+1\ (\mathrm{mod}\ k)italic_m = italic_j + 1 ( roman_mod italic_k ) and j=0,,k1𝑗0𝑘1j=0,\dots,k-1italic_j = 0 , … , italic_k - 1. This results in a polygon (Figure 1a, orange), whose connectivity represents the vertex sequence from the cyclic pair selection (Equation 3). For absolute readability of the resulting representation, one needs to additionally be able to identify the first vertex and the order of the vertex sequence, i.e., the orientation of the polygon. We achieve this by placing an arrow symbol at the first vertex 𝐯0subscript𝐯0\mathbf{v}_{0}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in direction of the second vertex 𝐯1subscript𝐯1\mathbf{v}_{1}bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ((i) in Figure 1a). Notice that we set the size of this symbol in the order of the size of the vertex dot to avoid interference in case of dense data. We call the resulting visualization cyclic polygon plot (CPP), and, compared to scatterplots, it draws a polygon for each multi-dimensional data value 𝐝𝐝\mathbf{d}bold_d, instead of a single dot. Since the polygons may overlap themselves, as well as other polygons (as discussed below), we use blending in the rendering step to convey such cases (see, e.g., the dark blue dot in Cyclic Polygon Plots).

2.4 Alternative Cyclic Pair Selection Scheme: \abcd

Before we come to the discussion of the properties of the CPP, let us have a quick look at an alternative scheme for cyclic pair selection that proved beneficial for reducing visual clutter. We denote this the \abcd scheme, and define it as (cf. Equation 3):

𝐝=(δ0,,δn1)(δ 2j,δ 2j+1(modn))j=0,,p1,𝐝subscript𝛿0subscript𝛿𝑛1maps-tosubscriptsubscript𝛿2𝑗subscript𝛿2𝑗1mod𝑛𝑗0𝑝1\mathbf{d}=(\delta_{0},\cdots,\delta_{n-1})\mapsto(\delta_{\,2j},\delta_{\,2j+% 1\ (\mathrm{mod}\ n)})_{j=0,\cdots,p-1}\,,bold_d = ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↦ ( italic_δ start_POSTSUBSCRIPT 2 italic_j end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 2 italic_j + 1 ( roman_mod italic_n ) end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j = 0 , ⋯ , italic_p - 1 end_POSTSUBSCRIPT , (6)

with pn/2𝑝𝑛2p\coloneqq\lceil n/2\rceilitalic_p ≔ ⌈ italic_n / 2 ⌉. In other words, for even n𝑛nitalic_n,

𝐝=(δ0,,δn1)(δ0,δ1),(δ2,δ3),,(δn2,δn1),formulae-sequence𝐝subscript𝛿0subscript𝛿𝑛1maps-tosubscript𝛿0subscript𝛿1subscript𝛿2subscript𝛿3subscript𝛿𝑛2subscript𝛿𝑛1\mathbf{d}=(\delta_{0},\cdots,\delta_{n-1})\mapsto(\delta_{0},\delta_{1}),(% \delta_{2},\delta_{3}),\cdots,(\delta_{n-2},\delta_{n-1})\,,bold_d = ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↦ ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , ⋯ , ( italic_δ start_POSTSUBSCRIPT italic_n - 2 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) , (7)

and for odd n𝑛nitalic_n,

𝐝=(δ0,,δn1)(δ0,δ1),(δ2,δ3),,(δn1,δ0).formulae-sequence𝐝subscript𝛿0subscript𝛿𝑛1maps-tosubscript𝛿0subscript𝛿1subscript𝛿2subscript𝛿3subscript𝛿𝑛1subscript𝛿0\mathbf{d}=(\delta_{0},\cdots,\delta_{n-1})\mapsto(\delta_{0},\delta_{1}),(% \delta_{2},\delta_{3}),\cdots,(\delta_{n-1},\delta_{0})\,.bold_d = ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ↦ ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , ⋯ , ( italic_δ start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (8)

That is, in case of odd dimension of the multi-dimensional value 𝐝𝐝\mathbf{d}bold_d, its first component δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is repeated for the last vertex. Overall, for the \abcd scheme, k=p𝑘𝑝k=pitalic_k = italic_p, i.e., it decomposes the n𝑛nitalic_n-dimensional data domain into a sequence of p𝑝pitalic_p two-dimensional subspaces. Regarding the map** (Section 2.3), the only difference to the \abbc scheme is that vertices from Equation 6 are used instead from Equation 3.

We observe that the \abcd scheme is contained in the \abbc scheme, i.e., the \abcd scheme consists of every second vertex of the \abbc scheme (see Figure 1a and Cyclic Polygon Plots and Cyclic Polygon Plots). Nevertheless, due to the sequential overlap of the \abbc scheme, this subsampling does not cause loss of information, it simply discards the redundancy contained in the \abbc scheme. This reduction has the advantage of reducing the complexity of the visual representation, and thus reducing visual clutter in large datasets, which is our motivation for this scheme. As is evident from Equation 8, the repetition of the first component δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in the odd dimension case can introduce bias to the visualization toward this first dimension. However, our study results and application to real datasets show that it has only a negligible effect on the interpretability in practice, and confirm this repetition to be a feasible solution.

2.5 Properties

Let us now investigate some properties of cyclic polygon plots.

2.5.1 Relation to Previous Work

As indicated above, our approach represents a generalization of the scatterplot. For n=2𝑛2n=2italic_n = 2, both the \abbc and \abcd schemes result in the traditional scatterplot. For n>2𝑛2n>2italic_n > 2, the first vertex of each polygon (of either scheme) is still identical to the scatterplot of (δ0,δ1)subscript𝛿0subscript𝛿1(\delta_{0},\delta_{1})( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ).

Our approach can also be interpreted as a projection of a modification of 3D parallel coordinates [30], as illustrated in Figure 1b. In their work, Johansson et al. replace each axis of the traditional PCP by a 2D space spanned by two axes (black arrows in Figure 1b). As a consequence, a multi-dimensional value leads to a point in each of their 2D spaces (red dots), which are connected to a polyline (red) in analogy to the traditional PCP. If, in their concept, one replaces their 2D spaces by our 2D subspaces (Equations 3 and 6), employs orthographic projection (dashed) of the resulting polylines along the “third” axis, and closes the resulting polylines (dotted orange), we obtain our CPP (orange) in a common 2D space (green).

2.5.2 Basic Reading

A basic task in CPP-based analysis is to determine the original multi-dimensional value 𝐝𝐝\mathbf{d}bold_d from a respective polygon. For this, the arrow symbol needs to be identified (Figure 1a). Its coordinates on the abscissa and ordinate (which we also denote x𝑥xitalic_x- and y𝑦yitalic_y-coordinates) give us δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The direction of the arrow symbol guides us then to the next vertex, whose coordinates are for the \abbc scheme δ1subscript𝛿1\delta_{1}italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and δ2subscript𝛿2\delta_{2}italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and for the \abcd scheme δ2subscript𝛿2\delta_{2}italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and δ3subscript𝛿3\delta_{3}italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Assuming that all vertices of a cyclic polygon are distinct (i.e., 𝐯j𝐯m,jmformulae-sequencesubscript𝐯𝑗subscript𝐯𝑚for-all𝑗𝑚\mathbf{v}_{j}\neq\mathbf{v}_{m}\,,\forall j\neq mbold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ bold_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , ∀ italic_j ≠ italic_m), then each dot is shared by exactly two edges. This lets us unambiguously follow the edge to the next vertex, and so on, until we reach the first one, which indicates that we read the full multi-dimensional value 𝐝𝐝\mathbf{d}bold_d. This reading appears difficult at first sight, but the user study shows that it competes well with reading of PCPs and RCs.

However, the advantage that CPPs are often more image-space efficient and readable than PCPs and RCs, also because they do not need to draw more than two axes and because they keep these axes away from the content, comes at the cost of a weakness: reading becomes more difficult if vertices appear more than once in a sequence, i.e., if jm𝑗𝑚\exists j\neq m∃ italic_j ≠ italic_m such that 𝐯j=𝐯msubscript𝐯𝑗subscript𝐯𝑚\mathbf{v}_{j}=\mathbf{v}_{m}bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_v start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

Let us start with the orange polygon in Cyclic Polygon Plots, an example employing the \abbc scheme without multiple vertices. We identify the arrow at coordinates (5,6)56(5,6)( 5 , 6 ), followed by (6,7)67(6,7)( 6 , 7 ), (7,8)78(7,8)( 7 , 8 ), (8,9)89(8,9)( 8 , 9 ), (9,10)910(9,10)( 9 , 10 ), and (10,5)105(10,5)( 10 , 5 ) at the lower right corner of the polygon. This is its last vertex, since the next edge brings us back to the arrow symbol. We identified all six mutually different dots of the six-dimensional value, and we also see that all dots have the same saturation (no one is darker due to blending of multiple dots).

For the red polygon in Cyclic Polygon Plots, we identify the arrow symbol, and follow via the bottom right dot to the bottom left dot, which is the first dot depicted in darker red, because the blending of its multiple instances resulted in a darker color. Since using brightness as visual variable for ordinal data does not perform well, it would be hard to determine from the color that this dot appears in fact three times. Nevertheless, since it is the only dot that is darker in this polygon, and since there are four distinct dots for a six-dimensional value, one could derive that its multiplicity has to be three. Indeed, almost all configurations with identical dots we investigated, could be determined by graph theory considerations, even if darker color was only used as an indicator that there was more than one dot at the respective location. However, such considerations would be cumbersome and in most applications impede full quantitative reading.

Overall, we draw two conclusions with respect to readability and identical vertices. Firstly, the issue with identical sequence vertices cannot arise if for each multi-dimensional value 𝐝𝐝\mathbf{d}bold_d, all its components δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are distinct. This is no strong requirement in generic cases, since the dimension n𝑛nitalic_n of the value 𝐝𝐝\mathbf{d}bold_d is often low. Furthermore, identical values in entire datasets are often considered degenerate, and removed using perturbation or simulation of simplicity [14], following the motivation that natural data do not exhibit identical values. Secondly, as we show in our study and demonstrate in our results, CPPs are particularly powerful for qualitative analysis, and lose only a minor part of their quantitativeness if identical vertices are present. Finally, as is illustrated in Cyclic Polygon Plots and Cyclic Polygon Plots, the \abcd scheme tends to reduce vertex multiplicity (here, from 4 to 2 for the blue polygon and from 3 to 2 for the red one) due to its overall vertex reduction property.

2.5.3 Point–Line Duality

The well-known point–line duality between SPs and PCPs relates a point in the SP to a line segment in the subdomain of the PCP spanned by the respective axis pair. The duality also holds the other way around, i.e., a point in the PCP relates to a line in the SP [25].

Our approach maps, similar to scatterplots, pairs of values to vertices. Therefore, for the \abbc scheme, a vertex of the CPP corresponds to a line segment in the PCP. Beyond that, two consecutive vertices of the \abbc CPP consist of three consecutive data values, and as such correspond to two consecutive line segments in the PCP. For the \abcd scheme, a vertex in the CPP also relates to an edge in the PCP. Two consecutive vertices of the CPP, however, consist of four consecutive data values, and thus correspond to three consecutive line segments in the PCP.

2.5.4 Slopes and Offsets

Beyond these straightforward relations, we observe interesting and useful relations regarding slopes and offsets. Assuming that all axes in a PCP have the same scaling and offset (such as in Cyclic Polygon Plots) and assuming the distance between the axes is 1, then the slope of a line segment in such a PCP equals the distance of the corresponding vertex in the CPP to the diagonal of the CPP, with slope unit 2/222\sqrt{2}/2square-root start_ARG 2 end_ARG / 2 (in case of equal scaling on both axes). The slope in the PCP is positive if the vertex is above the diagonal in the CPP, and negative if below. For example, all five collinear orange points in Cyclic Polygon Plots have distance 2/222\sqrt{2}/2square-root start_ARG 2 end_ARG / 2 from the diagonal and are located above it, therefore, the corresponding segments of the PCP have slope 1111, as one can see from the orange polyline in Cyclic Polygon Plots. Analogously, one can see that the cyan line has slope 11-1- 1. Notice that the five intervals of the PCP map to the five collinear vertices in the CPP, and that the sixth vertex in the CPP corresponds to the interval wrap** around from the last PCP axis to its first axis, exhibiting slope 5555 for the cyan polygon. Beyond that, notice that the cyan polygon is shifted along the CPP diagonal toward the origin by one, which corresponds to the values being one unit smaller (see Cyclic Polygon Plots). The above observations hold for both the \abbc and the \abcd scheme.

There is also a converse relation regarding slopes with the \abbc scheme. An edge in the CPP connects two vertices there, and thus relates two consecutive line segments in the PCP, centered at the PCP axis that is shared by the two line segments. Thereby, the slope of the edge in the CPP represents the factor by which the slopes of the two line segments in the PCP differ, i.e., the slope in the CPP is equal to the factor with which one needs to multiply the slope of the left line segment in the PCP to obtain the slope of the line segment to its right.

2.6 Placement

Similar to the PCP and RC, the CPP works well for datasets of medium size, but tends to suffer from overdrawing when applied to larger datasets. This is a drawback inherent to line-based visualization approaches [37, 40]. To alleviate this problem, we reinterpret our CPP polygons as glyphs, scale them down by a factor of 0.050.050.050.05 (if not stated otherwise) and employ placement, enabling small multiples [20]. That is, the position of a CPP polygon is no longer determined by the value of its vertices, but by properties derived from the polygon and mapped to the coordinates of its centroid. This represents a dimensionality reduction technique, which is quantitative and consistent with the glyph it positions. We derive and evaluate (Section 3.2) four different placement strategies.

2.6.1 Intrinsic Placement

We denote this strategy intrinsic, because for the \abcd scheme, it does not move the individual polygons. Instead, each polygon is simply downscaled, while fixing the position of its centroid. The centroids of the polygons of the \abbc scheme are, however, all located on the diagonal of the CPP, due to the discussed properties of Equation 3. Therefore, we translate each polygon from the \abbc scheme to the centroid of the corresponding \abcd polygon prior to downscaling. Figure 4c shows intrinsic placement for the glyph of the \abcd scheme at the example of the Iris dataset.

2.6.2 Geometric Placement

Our experience with the CPP, as well as its intrinsic glyph placement, indicated that the polygons are an effective means for qualitative multi-dimensional visualization. This motivated us to derive a placement strategy based on the shape of the polygons, i.e., to derive from a CPP polygon two quantities that could define the new x𝑥xitalic_x- and y𝑦yitalic_y-coordinate of its centroid. We chose the quite straightforward measures area and circumference, respectively, which performed surprisingly well.

While the computation of the circumference of a polygon is unambiguous and straightforward, different approaches exist to define the area of possibly self-intersecting polygons [42]. Firstly, it can be interpreted as the “footprint” of the polygon, representing the entire area encased by the polygon, and disregarding inner edges. Secondly, it can be interpreted as the difference between front-facing and back-facing segments of the polygon, this time respecting the intersection of polygon edges. We chose the latter variant, because it more significantly captures the geometry of the polygon and is more continuous w.r.t. its variation. It can be calculated using the Gaussian area formula [6], which, for the \abbc scheme, can be abbreviated and calculated directly from the high-dimensional value 𝐝𝐝\mathbf{d}bold_d according to

μ\abbc=12|j=0n1(δjδj+1(modn)δj2)|.subscript𝜇\abbc12superscriptsubscript𝑗0𝑛1subscript𝛿𝑗subscript𝛿𝑗1mod𝑛superscriptsubscript𝛿𝑗2\mu_{\abbc}=\frac{1}{2}\left|\sum_{j=0}^{n-1}\left(\delta_{j}\delta_{j+1\ (% \mathrm{mod}\ n)}-\delta_{j}^{2}\right)\right|\,.italic_μ start_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG | ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_j + 1 ( roman_mod italic_n ) end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) | . (9)

The area for the \abcd scheme can be obtained efficiently by

μ\abcd=12|j=0p1(xjyj(modp)yjxj(modp))|,subscript𝜇\abcd12superscriptsubscript𝑗0𝑝1subscript𝑥𝑗subscript𝑦𝑗mod𝑝subscript𝑦𝑗subscript𝑥𝑗mod𝑝\mu_{\abcd}=\frac{1}{2}\left|\sum_{j=0}^{p-1}\left(x_{j}y_{j\ (\mathrm{mod}\ p% )}-y_{j}x_{j\ (\mathrm{mod}\ p)}\right)\right|\,,italic_μ start_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG | ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j ( roman_mod italic_p ) end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j ( roman_mod italic_p ) end_POSTSUBSCRIPT ) | , (10)

with pn/2𝑝𝑛2p\coloneqq\lceil n/2\rceilitalic_p ≔ ⌈ italic_n / 2 ⌉ (as above), and 𝐯j=(xj,yj)subscript𝐯𝑗subscript𝑥𝑗subscript𝑦𝑗\mathbf{v}_{j}=(x_{j},y_{j})bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (Equation 6). Figure 4d shows an example of geometric placement for the \abcd glyph of the Iris dataset. Notice that because this dataset is four-dimensional, the \abcd polygons are line segments, which do not possess area. Nevertheless, the resulting placement still results in a valid representation in this example (Figure 4e).

2.6.3 Angular Placement

As a complementary approach to geometric placement which considers the vertex positions, we propose angular placement, which is derived from the signed angles at the polygon vertices. More precisely, the sum of the counter-clockwise angles is mapped to the x𝑥xitalic_x-coordinate of a polygon’s centroid, whereas the sum of the clockwise angles is mapped to its y𝑦yitalic_y-axis. We employ the \abcd scheme for glyph placement due to its clutter-reducing property. Consequently, we do not show angular placement for the Iris dataset, because the \abcd scheme results in line-type polygons (Section 2.6.2) due to its four-dimensional data. Therefore, we refer here to the Billiard dataset for an example of angular placement (Figure 6e).

2.6.4 Statistical Placement

Finally, and mainly for comparison, we consider a fourth placement, which sets the x𝑥xitalic_x-coordinate of the centroid to the mean of all δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of data value 𝐝𝐝\mathbf{d}bold_d, and its y𝑦yitalic_y-coordinate to their standard deviation. Again for the Iris dataset, Figure 4f shows a respective example.

3 Results

Our evaluation is organized into three parts to adequately cover the features of the our approach. We compare to existing techniques and assess advantages and drawbacks with respect to quantitative analysis, readability, feature extraction, and cluttering. First, we evaluate basic properties regarding information extraction using selected datasets (Section 3.1). We then move on to evaluating placement (Section 3.2), where we discuss and evaluate our strategies in the context of dimensionality reduction techniques, and provide a quantitative assessment of our clustering. Finally, we confirm these previously discussed results with a user study covering frequent information visualization tasks (Section 3.3).

3.1 Plots

In order to provide a representative overview over the visual properties of our approach, we examine applications of the CPP ranging from smaller datasets, where features of distinct values can be extracted (Sections 3.1.1, 3.1.2 and 3.1.3), to larger datasets (Section 3.1.4), where the exploration of general structure of the dataset is desirable. In the following, Figures 2 and 3 provide a comparison of the quantitative data analysis of the PCP, RC, and our CPP with the \abbc and \abcd scheme. If not explicitly mentioned, linear scaling for all axes is employed.

3.1.1 Synthetic Dataset

Due to its simplicity, the dataset in Cyclic Polygon Plots provides good access to briefly discuss interesting geometric properties of CPPs. Especially noteworthy is the cyan polygon in Cyclic Polygon Plots, which constitutes a mirrored and translated variant of the orange polygon. This reflects reversal of the component order in 𝐝𝐝\mathbf{d}bold_d (which accounts for the mirroring along the main diagonal) and the addition of one to its entries (translating the polygon along the main diagonal).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 2: 5D QCM10 dataset. PCP LABEL:sub@fig:qcm-pcp, RC LABEL:sub@fig:qcm-rc, CPP with \abbc LABEL:sub@fig:qcm-abbc, and \abcd LABEL:sub@fig:qcm-abcd scheme. Observe linear trend in LABEL:sub@fig:qcm-abbc, LABEL:sub@fig:qcm-abcd.

3.1.2 QCM10 Dataset

With the QCM10 dataset [1] (Figure 2) from the UCI machine learning repository [13], we give an example of a typical application in the field of sensory measurement analysis. The dataset contains data of five gaseous alcohols (the five classes) with varying air-to-gas concentrations. Due to the experimental design employed in the creation of this dataset, it contains an interesting property, in that all its members feature decreasing numeric values for increasing gas ratios (the five dimensions from first to last).

In the CPPs (Figure 2c–d), this trend is clearly visible, indicated by the majority of polygon vertices being located consistently below the main CPP diagonal, with edges connecting in counterclockwise order. This polygon shape signifies a negative correlation between adjacent value components (cf. Section 2.5.4) and exemplifies the viability of our technique for trend identification inside (position of edges of a single polygon in image space) and across values (polygon edges in relation to each other). In the PCP and RC, however, due to their independent normalization of each dimension, this correlation is hard to see, with some polylines featuring positive and some negative slopes.

3.1.3 Iris Dataset

This dataset [18] consists of 150 four-dimensional values containing physiological measurements about the iris flower, clustered into three classes (the three subspecies). It is a well researched and, in multi-dimensional visualization, often considered dataset. This popularity makes its CPP representation (Figure 3c–d) especially interesting, and promotes the comparability of our approach.

Apparent features standing out in the CPP \abcd scheme (Figure 3d) are the two-vertex polygons (i.e., lines). They signify an especially compact representation, where an entire four-dimensional value is tangible with a single line, which is generally not possible for the PCP and RC (Figures 3a and 3b). More importantly, the \abcd scheme achieves full separation of the green cluster for this dataset, whereas the PCP and RC exhibit clutter similar to the \abbc CPP. This emphasizes the ability of our CPP with creation scheme \abcd to produce a compact representation that aides in cluster identification due to its cluttering reducing property; significantly more so than the correspondent PCP or RC representations.

3.1.4 Wine Dataset

The Wine dataset [19] from the UCI machine learning repository [13], features 178 13-dimensional values containing the chemical composition of wines, again clustered into three classes. Next to its clustering difficulty and general complexity for quantitative display, this dataset is especially worthy of consideration due to the differing number ranges between its dimensions, a property characteristic to its chemical content analysis.

Utilization of regular, linear scales (Figures 3e, 3f, 3g and 3h) leads to crowded representations for the PCP, RC, \abbc CPP, and \abcd CPP. In particular, the CPPs are dominated by the large components in the data. However, employing logarithmic scaling on the plots (Figures 3i, 3j, 3k and 3l) drastically increases readability and image-space utilization in the CPPs. Where a number of vertices in the linearly scaled CPPs lie close to the origin, they now contribute significantly to the resulting polygon shape, and uncover the principal relation of the components of the underlying n𝑛nitalic_n-dimensional value. Due to individually scaled dimensions, logarithmic scales have a noticeably lower impact with both the PCP and RC. This signifies an advantage of the single 2D space used by our technique, since logarithmic scaling not only uncovers additional structure but also simplifies interpretation of the plot due to the single pair of axes.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Refer to caption
(l)
Figure 3: Iris dataset (4D, linear scale LABEL:sub@fig:iris-pcpLABEL:sub@fig:iris-abcd) and 13-dimensional Wine dataset (linear scale LABEL:sub@fig:wine-linear-pcpLABEL:sub@fig:wine-linear-abcd and logarithmic scale LABEL:sub@fig:wine-log-pcpLABEL:sub@fig:wine-log-abcd). Column arrangement analogous to Figure 2.

3.2 Placement

Refer to caption
(a)  θ=0.95𝜃0.95\theta=\textbf{0.95}italic_θ = 0.95, τ=0.63𝜏0.63\tau=0.63italic_τ = 0.63
Refer to caption
(b)  θ=0.93𝜃0.93\theta=0.93italic_θ = 0.93, τ=0.64𝜏0.64\tau=0.64italic_τ = 0.64
Refer to caption
(c)  θ=0.86𝜃0.86\theta=0.86italic_θ = 0.86, τ=0.45𝜏0.45\tau=0.45italic_τ = 0.45
Refer to caption
(d)  θ=0.92𝜃0.92\theta=0.92italic_θ = 0.92, τ=0.57𝜏0.57\tau=0.57italic_τ = 0.57
Refer to caption
(e)  θ=0.69𝜃0.69\theta=0.69italic_θ = 0.69, τ=0.21𝜏0.21\tau=0.21italic_τ = 0.21
Refer to caption
(f)  θ=0.85𝜃0.85\theta=0.85italic_θ = 0.85, τ=0.43𝜏0.43\tau=0.43italic_τ = 0.43
Figure 4: Iris dataset (4D). Comparison of t-SNE LABEL:sub@fig:iris-placement-tsne and UMAP LABEL:sub@fig:iris-placement-umap with our intrinsic LABEL:sub@fig:iris-placement-avg, geometric (\abcdLABEL:sub@fig:iris-placement-geo, geometric (\abbcLABEL:sub@fig:iris-placement-geo-abbc, and statistical LABEL:sub@fig:iris-placement-stat placement. Subcaptions refer to the corresponding Jaccard index (θ𝜃\thetaitalic_θ) and silhouette coefficient (τ𝜏\tauitalic_τ). Bold values denote best performance (see also Table 1).

Before discussing the qualitative comparison of our placement strategies, we derive a quantitative measure to evaluate placements. To do this effectively, we evaluate the k-means partitioning of a placement versus the true classification labels of the corresponding dataset, by calculating the Jaccard index [27] (θ[0,1]𝜃01\theta\in[0,1]italic_θ ∈ [ 0 , 1 ], higher is better) as a measure of similarity between two classifications and the silhouette coefficient [41] (τ[1,1]𝜏11\tau\in[-1,1]italic_τ ∈ [ - 1 , 1 ], higher is better) as a measure of value–cluster proximity.

We compare (Table 1) all four variants of our placement (Section 2.6) using the \abbc and \abcd scheme to t-SNE (perplexity = 5, 30, 80) and UMAP (nNeighbors = 5, 15, 50) (intrinsic placement is not calculated for the \abbc scheme, statistical placement is invariant to both schemes). This serves as a baseline for the following discussion, where we present some detailed examples of our placement. If not explicitly mentioned, the following Figures 4, 5 and 6 depict t-SNE and UMAP (in their best configuration according to Table 1), and our four placements using the \abcd scheme.

3.2.1 Iris Dataset

Whereas the CPP with the \abcd scheme proved especially strong in separating the green cluster (Figure 3d), the geometric placement performance of \abcd in Figure 4d (θ=0.92𝜃0.92\theta=0.92italic_θ = 0.92, τ=0.574𝜏0.574\tau=0.574italic_τ = 0.574) is competitive with the best clustering result (Figure 4a, θ=0.945𝜃0.945\theta=0.945italic_θ = 0.945, τ=0.632𝜏0.632\tau=0.632italic_τ = 0.632)

Especially still, our geometric \abbc scheme placement (Figure 4e, θ=0.708𝜃0.708\theta=0.708italic_θ = 0.708, τ=0.21𝜏0.21\tau=0.21italic_τ = 0.21) (which we visualize instead of the, in the two-vertex polygon case insignificant, angular placement) profits from our scaled-down polygons (compared to simple points), since it enables the separation of the orange and cyan cluster by comparing the different wedge shape between the polygons of each cluster.

Interesting to note is the similar performance of the statistical placement (Figure 4f, θ=0.853𝜃0.853\theta=0.853italic_θ = 0.853, τ=0.429𝜏0.429\tau=0.429italic_τ = 0.429) to t-SNE and UMAP (Figures 4a and 4b), as this relation also carries significance for the following considered datasets (Sections 3.2.2 and 3.2.3).

3.2.2 Wine Dataset

Here, our intrinsic placement (Figure 5c, θ=0.803𝜃0.803\theta=0.803italic_θ = 0.803, τ=0.296𝜏0.296\tau=0.296italic_τ = 0.296) constitutes the best cluster separation compared to t-SNE (Figure 5a, θ=0.725𝜃0.725\theta=0.725italic_θ = 0.725, τ=0.259𝜏0.259\tau=0.259italic_τ = 0.259) and UMAP (Figure 5b, θ=0.724𝜃0.724\theta=0.724italic_θ = 0.724, τ=0.25𝜏0.25\tau=0.25italic_τ = 0.25). Additionally, our small polygons feature distinct shapes which improves individual value identification in the placement.

Furthermore, notice that the relatively worse performance of the statistical placement (Figure 5f, θ=0.702𝜃0.702\theta=0.702italic_θ = 0.702, τ=0.201𝜏0.201\tau=0.201italic_τ = 0.201) is again analogous to UMAP in terms of θ𝜃\thetaitalic_θ and τ𝜏\tauitalic_τ, and also w.r.t. qualitative expressiveness. This suggests a correlation between measures directly calculated from the n𝑛nitalic_n-dimensional value and dimensionality reduction results, confirming that the utilization of geometric polygon properties for placement is justified and beneficial.

Refer to caption
(a)  θ=0.73𝜃0.73\theta=0.73italic_θ = 0.73, τ=0.26𝜏0.26\tau=0.26italic_τ = 0.26
Refer to caption
(b)  θ=0.72𝜃0.72\theta=0.72italic_θ = 0.72, τ=0.25𝜏0.25\tau=0.25italic_τ = 0.25
Refer to caption
(c)  θ=0.80𝜃0.80\theta=\textbf{0.80}italic_θ = 0.80, τ=0.30𝜏0.30\tau=0.30italic_τ = 0.30
Refer to caption
(d)  θ=0.58𝜃0.58\theta=0.58italic_θ = 0.58, τ=0.10𝜏0.10\tau=0.10italic_τ = 0.10
Refer to caption
(e)  θ=0.70𝜃0.70\theta=0.70italic_θ = 0.70, τ=0.08𝜏0.08\tau=0.08italic_τ = 0.08
Refer to caption
(f)  θ=0.70𝜃0.70\theta=0.70italic_θ = 0.70, τ=0.20𝜏0.20\tau=0.20italic_τ = 0.20
Figure 5: Wine dataset (13-dimensional). Comparison of t-SNE LABEL:sub@fig:wine-placement-tsne, UMAP LABEL:sub@fig:wine-placement-umap, and our intrinsic LABEL:sub@fig:wine-placement-avg, geometric LABEL:sub@fig:wine-placement-geo, angular LABEL:sub@fig:wine-placement-ang, and statistical LABEL:sub@fig:wine-placement-stat placement.

3.2.3 Billiard Dataset

Generally, phase spaces of billiard dynamics [44] are visualized in a 2D plot, where two orthogonal axes are used to display its two components, i.e., angle and arclength. This works great for single trajectories, but is unsuited for comparing sets of trajectories.

To test the suitability of our approach for this type of analysis, we employ it to the Billiard dataset, containing the phase space of 60 2D elliptical billiard dynamics [44] trajectories with 50 reflections each. The starting position and direction of each trajectory are seeded three degrees apart from its predecessor, and the trajectories are split into three clusters by varying the parameter A𝐴Aitalic_A of the elliptical border x2/A2+y2/B2=1superscript𝑥2superscript𝐴2superscript𝑦2superscript𝐵21{x^{2}}/{A^{2}}+{y^{2}}/{B^{2}}=1italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 by 0.10.10.10.1 each.

Comparing clustering performance alone, the geometric placement (Figure 6d, θ=1𝜃1\theta=1italic_θ = 1, τ=0.977𝜏0.977\tau=0.977italic_τ = 0.977) delivers a nearly perfect result, with complete separation of all three clusters. However, the angular placement (Figure 6e, θ=1𝜃1\theta=1italic_θ = 1, τ=0.657𝜏0.657\tau=0.657italic_τ = 0.657) shows a wider distribution of the values in image space, encouraging leveraging of this additional information in qualitative analysis. Compared to the previous datasets, the characteristic of billiard dynamics phase space seems to suit the angular placement especially well, possibly because a significant amount of inherent information is related to angles.

Crucially, the statistical placement (Figure 6f, θ=0.35𝜃0.35\theta=0.35italic_θ = 0.35, τ=0.033𝜏0.033\tau=-0.033italic_τ = - 0.033) fails to provide a clearly separated embedding of the three clusters, visually in line with both t-SNE (Figure 6a, θ=0.785𝜃0.785\theta=0.785italic_θ = 0.785, τ=0.324𝜏0.324\tau=0.324italic_τ = 0.324) and UMAP (Figure 6b, θ=0.74𝜃0.74\theta=0.74italic_θ = 0.74, τ=0.337𝜏0.337\tau=0.337italic_τ = 0.337) results, further reinforcing our theory of their analogy in terms of clustering performance discussed in Section 3.2.1.

Refer to caption
(a) θ=0.79𝜃0.79\theta=0.79italic_θ = 0.79, τ=0.32𝜏0.32\tau=0.32italic_τ = 0.32
Refer to caption
(b) θ=0.72𝜃0.72\theta=0.72italic_θ = 0.72, τ=0.25𝜏0.25\tau=0.25italic_τ = 0.25
Refer to caption
(c) θ=0.70𝜃0.70\theta=0.70italic_θ = 0.70, τ=0.38𝜏0.38\tau=0.38italic_τ = 0.38
Refer to caption
(d) θ=1.00𝜃1.00\theta=\textbf{1.00}italic_θ = 1.00, τ=0.98𝜏0.98\tau=0.98italic_τ = 0.98
Refer to caption
(e) θ=1.00𝜃1.00\theta=\textbf{1.00}italic_θ = 1.00, τ=0.66𝜏0.66\tau=0.66italic_τ = 0.66
Refer to caption
(f) θ=0.35𝜃0.35\theta=0.35italic_θ = 0.35, τ=0.03𝜏0.03\tau={-0.03}italic_τ = - 0.03
Figure 6: Billiard dataset (100-dimensional). Comparison of t-SNE LABEL:sub@fig:wine-placement-tsne and UMAP LABEL:sub@fig:wine-placement-umap with our intrinsic LABEL:sub@fig:wine-placement-avg, geometric LABEL:sub@fig:wine-placement-geo, angular LABEL:sub@fig:wine-placement-ang, and statistical LABEL:sub@fig:wine-placement-stat placement.
Table 1: Clustering performance comparing CPP placement strategies with the 2D embedding of t-SNE and UMAP, in terms of Jaccard index θ𝜃\thetaitalic_θ and silhouette coefficient τ𝜏\tauitalic_τ. The t-SNE and UMAP results refer to the average result after ten runs each, to compensate for possible variance between single runs. Bold values denote best performance per column for the Jaccard index. Best performance of the silhouette coefficient is not emphasized, since interpretation of this value holds little significance without a corresponding Jaccard similarity value.

Approach

Cyclic Polygon Plot t-SNE UMAP

Configuration

\abcd \abbc perplexity nNeighbors

Placement

stat int geo ang geo ang 5 30 80 5 15 50

Iris

τ𝜏\tauitalic_τ 0.429 0.448 0.574 0.213 0.649 0.562 0.632 0.589 0.597 0.649 0.638
θ𝜃\thetaitalic_θ 0.853 0.860 0.920 0.693 0.633 0.943 0.945 0.927 0.926 0.905 0.934

Billiard

τ𝜏\tauitalic_τ -0.033 0.379 0.977 0.657 -0.015 -0.113 0.324 0.312 -0.049 0.177 0.337 0.253
θ𝜃\thetaitalic_θ 0.350 0.700 1.000 1.000 0.450 0.367 0.785 0.735 0.420 0.567 0.740 0.740

Wine

τ𝜏\tauitalic_τ 0.201 0.296 0.104 0.076 0.210 0.053 0.258 0.258 0.259 0.187 0.187 0.250
θ𝜃\thetaitalic_θ 0.702 0.803 0.584 0.697 0.708 0.517 0.724 0.724 0.725 0.653 0.653 0.724

3.3 User Study

In order to provide a quantitative assessment on the properties of our approach, we conducted a user study comparing the CPP to the PCP and RC. To achieve quantitative and comparable results, the user study was focused on the CPP without placement, since its evaluation we already discussed above. Additionally, the CPP was employed without start arrows and vertex circles, to limit the scope of the study to tasks that do not rely on this information (Section 3.3.1) and improve comparability to the other techniques. A representative sample of images used in the study for all tasks is available in the supplemental material together with further detail.

3.3.1 Tasks

We focus on measuring user performance [51] by comparing the approaches head to head [33, 26] in three main analytic tasks, which are widely employed in multi-dimensional data analysis [3]:

  • outlier detection (OD),

  • value retrieval (VR), and

  • value comparison (VC).

Formulation of these tasks motivated the following hypotheses.

3.3.2 Hypotheses

H1.

We assume that the CPP will see less of a degradation in task accuracy when moving from five- to ten-dimensional data than the PCP, due to PCP’s innate axis-cluttering when displaying higher-dimensional data.

H2.

We assume that the value retrieval task will perform better with CPPs than RCs, because, even though both being polygon based visualizations, the cyclic polygon plot benefits from a simple 2D space. This should especially hold true for the outlier detection task in comparison to both other approaches (PCP and RC).

H3.

When comparing the two creation schemes, we assume that \abbc will perform better in terms of task accuracy and completion time for lower-dimensional data (more structure) and \abcd better for higher-dimensional (less overdrawing).

We expect H1 to hold true to at least similar extent when comparing to the RC instead of the PCP, due to its inferiority in this regard to linear layouts [47, 21].

3.3.3 Datasets

To effectively measure the performance of the three tasks, we created specific datasets for each of the tasks. All datasets were created with up to ten ten-dimensional values. We chose the maximum number of ten members per dataset to be able to support ten uniquely colored polylines/polygons. For this, we used a 10-color-paired color scheme with light and dark shades of five different color hues.

The first dataset was created by filling all components of all n𝑛nitalic_n-dimensional values with random uniform noise in the interval [0,0.8]00.8[0,0.8][ 0 , 0.8 ], and then inserting a single random number in the interval [0.8,1]0.81[0.8,1][ 0.8 , 1 ]. This dataset was used for the outlier detection task.

The second dataset was created by first inserting random uniform noise in the interval [0,1]01[0,1][ 0 , 1 ] in all components of the n𝑛nitalic_n-dimensional value. Then a single, manually defined numeric value, was inserted as one of the components in a random dataset member. This dataset was used for the outlier detection and value retrieval tasks.

The third dataset was created analogous to the second dataset, with the exception of inserting two but one numeric values in the same fashion. This dataset was used for the value comparison task. In order to also account for datasets with different number ranges per dimension, which is frequently the case for datasets containing sensory measurements, for dataset type two and three, random scaling factors for each of the dimensions were employed, to scale all numeric values accordingly. These factors were used for half of the visualizations displayed in the study, the other half used datasets with values in the [0,1]01[0,1][ 0 , 1 ] range.

3.3.4 Questions

Derived from the tasks in Section 3.3.1, the following questions were designed to accompany the visualizations of the previously discussed datasets in the study. For the outlier detection task, two questions were designed, which were used alternately:

OD
  • Is there an attribute value greater 0.8 present in the dataset?

  • Select the color of the polyline / radar glyph / cyclic polygon representing the data-vector with the largest attribute value in the dataset.

VR
  • Does the displayed dataset contain an attribute value of exactly X𝑋Xitalic_X?

Varying phrasing of the value comparison question was necessary to suit the three different techniques:

VC
  • PCP: Is the attribute value represented by the indicated axis-intercept of polyline A or polyline B larger?

  • RC: Is the attribute value represented by the indicated axis-intercept of radar glyph A or radar glyph B larger?

  • CPP: Is the attribute value represented by the x𝑥xitalic_x/y𝑦yitalic_y-coordinate of vertex A or the x𝑥xitalic_x/y𝑦yitalic_y-coordinate of vertex B larger?

Refer to caption
(a) OD, 5D
Refer to caption
(b) OD, 10D
Refer to caption
(c) VR, 5D
Refer to caption
(d) VR, 10D
Refer to caption
(e) VC, 5D
Refer to caption
(f) VC, 10D
Refer to caption
(g) OD, 5D
Refer to caption
(h) OD, 10D
Refer to caption
(i) VR, 5D
Refer to caption
(j) VR, 10D
Refer to caption
(k) VC, 5D
Refer to caption
(l) VC, 10D
Figure 7: Results of the user study. First row shows task accuracy in percent for outlier detection (OD), value retrieval (VR), and value comparison (VC) for the five- (5D) and ten-dimensional (10D) data. Second row analogously shows task completion time. Whiskers in the bar plot show the confidence interval of one standard deviation. They are upper bound to 100% in the case where their value would exceed 100%. Whiskers of the box plot represent the 10–90 percentile interval around the median, the box itself represents the interquartile range (Q1–Q3). CPPA𝐴{}_{A}start_FLOATSUBSCRIPT italic_A end_FLOATSUBSCRIPT denotes the cyclic polygon plot with \abcd creation type, CPPB𝐵{}_{B}start_FLOATSUBSCRIPT italic_B end_FLOATSUBSCRIPT the cyclic polygon plot with \abbc creation type.

3.3.5 Design

Due to the COVID-19 pandemic, we held the user study online with 24 participants. We recruited them from the university environment, aging 22 to 55 years. All participants were given a detailed and live introduction and presentation via video call of 30–40 minutes on multi-dimensional data analysis, and more specifically, the three employed techniques. The introductory videos presented to the participants are provided in the supplemental material. Three example questions (one for each technique) were solved together, with feedback provided.

We used a Likert scale to have participants rate their experience in multi-dimensional data analysis from 1 (no experience) to 5 (expert). The mean value over all participants was 2.5 with three people selecting one and only one person selecting five as their experience level. Additionally, participants were presented the color scheme and asked to confirm their ability to discern all the displayed colors.

The study itself consisted of 54 questions, categorized as follows. For all three tasks, five- and ten-dimensional datasets were used with three questions per technique, per task, and per dataset dimensionality. Versions A and B of the study were created, where version A contained the CPP with the \abcd scheme and version B contained the CPP with the \abbc scheme. All other questions remained exactly the same. The questions were shown in the order PCP, RC, CPP while the order of questions per task was randomized to minimize learning effects across the study duration. One half of participants were shown study A, the other half was shown study B. A representative sample of visualizations used for the study is provided in the supplemental material.

Table 2: ANOVA results for the completion time of the user study for outlier detection (OD), value retrieval (VR), and value comparison (VC). 5D/10D denote the five- and ten-dimensional dataset.
Task F-value p-value

OD, 5D

1.737 1.68×1011.68superscript1011.68\times 10^{-1}1.68 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

OD, 10D

13.606 4.76×1074.76superscript1074.76\times 10^{-7}4.76 × 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT

VR, 5D

4.231 8×1038superscript1038\times 10^{-3}8 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

VR, 10D

5.957 1×1031superscript1031\times 10^{-3}1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

VC, 5D

2.208 9.5×1029.5superscript1029.5\times 10^{-2}9.5 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT

VC, 10D

4.772 4×1034superscript1034\times 10^{-3}4 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

3.3.6 Study Results

We first determined the statistical significance of our completion time results with ANOVA (α=0.05𝛼0.05\alpha=0.05italic_α = 0.05). The critical F-value for our study setup for all tasks is F(3,68)=2.740𝐹3682.740F(3,68)=2.740italic_F ( 3 , 68 ) = 2.740. The five-dimensional outlier detection and value comparison tasks have an F-value below this critical threshold, rendering them statistically not significant. For all other tasks, a statistically significant difference in variance between the visualizations for our chosen α𝛼\alphaitalic_α exists (Table 2).

No ANOVA was performed on the results of the task accuracy, since due to the design of our study (categorical, single choice answers), the assumption of normality is not given and ANOVA would not yield significant results [28]. Additionally, we argue that due to the narrow scope of the study, task accuracy detailed in Figures 7a, 7b, 7c, 7d, 7e and 7f provides sufficient expressiveness over the performance of the approaches.

The results of our study (Figure 7) show competitive properties of the CPP, especially promising in the ten-dimensional setting, as well as a sizeable advantage regarding the completion time across all settings of the outlier detection and value retrieval tasks. These results are detailed by the following discussion per task.

3.3.7 Outlier Detection

The five-dimensional setting (Figures 7a and 7g) was managed well by all four tested approaches. CPPs with the \abcd scheme performed slightly worse than their \abbc counterpart, which could be attributed to the redundancy present in the \abbc configuration. While scanning the visualization for outliers in the \abbc configuration, it is sufficient to focus either on the horizontal or the vertical axis.

In the ten-dimensional context, we see a more evident difference in accuracy between the approaches (Figure 7b). Both CPPs exhibit higher accuracy than the PCP and RC visualizations, with the \abbc scheme again performing even slightly better (H1). Additionally, completion time in the ten-dimensional setting is especially noteworthy, since here, CPPs performed on average almost two times faster than RCs and more than two times faster than PCPs (Figure 7h).

3.3.8 Value Retrieval

In the five-dimensional context, the PCP clearly performed best in both accuracy and completion time (Figures 7c and 7i). This can be attributed to the manageability of five dimensions by the PCP, which makes comparison between axes easy and fast. RCs exhibit the worst completion time for both five- and ten-dimensional settings, while also performing comparably in terms of accuracy in the five-dimensional setting compared to the CPP approaches.

Again, the move to ten-dimensional data shows a notable increase in performance for our CPPs (Figures 7d and 7j). Whereas PCP (lower accuracy and higher completion time) and RC (lower accuracy with equal completion time) show a decline in their performance compared to the five-dimensional setting, the CPP visualizations actually improve on both task accuracy and completion time. This confirms our previously discussed hypothesis (H1), that the CPP can adapt better to higher-dimensional data, and additionally shows that the lack of individual axes is not critical to the interpretability of our plot. In this setting, the CPP with the \abcd scheme shows the most significant increase in accuracy compared to the other approaches, confirming H2.

3.3.9 Value Comparison

Both RC and PCP performed comparably well in both accuracy and completion time (Figures 7e and 7k). Since in this configuration, values to compare were already highlighted in the visualization, the viewer could focus on the significant visualization parts of the RC and PCP, which reduced their complexity in the ten-dimensional setting.

While performing equally well in the five-dimensional context, the CPP showed a deterioration of accuracy of about 10% (\abbc) and 20% (\abcd) (Figure 7f). More crucially, task completion time for the CPPs were especially slow in the value comparison task (Figure 7l). This uncovers a drawback of the CPP which makes it necessary to refer to x𝑥xitalic_x- and y𝑦yitalic_y-components of the vertices to unambiguously refer to a single attribute value, which can be attributed to the necessary lack of individual axis labels in favor of screen-space efficiency. This aggravates the task description and required additional time for the participants to interpret the question which is evident from the longer completion time. Additionally, in the 10-dimensional context, this manifests itself in a lower task accuracy, which could be attributed to the complexity of the plot, which already exhibited some amount of overdrawing.

H3 only held true in context of value retrieval. For both other tasks, the \abbc had slight accuracy advantage in all configurations.

4 Discussion and Limitations

We have investigated the CPP with two cyclic selection schemes and four variants of its placement. Our results have shown that the \abcd scheme provides a good baseline for all discussed datasets, resulting in a valid and expressive visualization. In lower-dimensional datasets especially, we suggest the \abbc scheme to uncover additional dataset structure, which our results confirm.

Regarding placement of the polygons, the intrinsic placement strategy proved as a solid baseline, showing competitive performance for all datasets discussed in our results. Employment of the angular placement is mostly limited to niche applications, but can be especially advantageous when used with suitable datasets, e.g., the Billiard dataset. Statistical placement shows little significance beyond the fact that it confirms the validity of our other placements, which are motivated directly from the polygon geometry and generally show better performance. Motivated by these results, we recommend the use of intrinsic placement as the default placement strategy as it provides very competitive results for all datasets and suggest our other placement types as supplemental and application-specific. Additionally, we recommend our placement over other optimization-based approaches like t-SNE and UMAP, since it preserves a strong correspondence to our polygons, which, when viewed as small glyphs, convey additional information about the underlying data.

The difficulty in representing identical vertices of a polygon, which we addressed with compositing in the rendering step, is still an innate drawback, but, as our results and user study show, has little impact in practice when applying the CPP to real datasets.

The comparatively worse performance of the CPP in the value comparison task of the user study emphasizes another innate drawback of the CPP in its complexity of referring to explicit components of a polygon vertex. Whereas specific value components in techniques featuring separate coordinate axes per dimension can straightforwardly be referred to, the CPP necessitates closer inspection of a polygon with its starting arrow. While this circumstance can in part be attributed to the lack of individual axis labels, necessary for our screen-space efficient design, the user study and the application of our technique to real datasets again show that it is nevertheless competitively performant for key visualization tasks.

Finally, as it is a shared drawback of line-based approaches, cluttering and overdrawing still remains present in CPPs of higher dimensional data. We have shown, however, that, dependent on the displayed data, this problem can be alleviated in the CPP by using logarithmic scaling on the axes, which is especially effective in decompressing previously crowded areas in our plot, as we discussed in Section 3.1.4.

5 Conclusion

We introduced the cyclic polygon plot, a novel approach to visualize n𝑛nitalic_n-dimensional discrete data, based on decomposition of the original n𝑛nitalic_nD value into 2D subspaces, whose 2D points are projected to image space. A polygon representation preserves correspondence to the original data dimensions. We conducted a detailed evaluation and discussion of its properties, backed up by a a user study. Additionally, we derived glyphs from our approach, and presented novel strategies to place these glyphs based on their intrinsic properties, resulting in an approach that we compare to existing dimensionality reduction techniques. Although our approach outperforms existing techniques in some cases, it also exhibits limitations, including difficulties with identical values. Future work could research alternative representation of such multiple values.

Acknowledgements.
This work is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 281071066 – TRR 191 (Transregional Colloborative Research Center SFB / TRR 191) and Germany’s Excellence Strategy EXC2181/1 - 390900948 (the Heidelberg STRUCTURES Excellence Cluster).

References

  • [1] M. F. Adak, P. Lieberzeit, P. Jarujamrus, and N. Yumusak. Classification of alcohols obtained by QCM sensors with different characteristics using ABC based neural network. Engineering Science and Technology, an International Journal, 23(3):463–469, June 2020.
  • [2] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S.-H. Hong, D. Koschützki, C. Murray, N. S. Nikolov, R. Taib, A. Tarassov, and K. Xu. GEOMI: GEOmetry for Maximum Insight. In P. Healy and N. S. Nikolov, editors, Graph Drawing, Lecture Notes in Computer Science, pages 468–479, Berlin, Heidelberg, 2006. Springer.
  • [3] R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., pages 111–117, Oct. 2005.
  • [4] A. Artero, M. de Oliveira, and H. Levkowitz. Enhanced High Dimensional Data Visualization through Dimension Reduction and Attribute Arrangement. In Tenth International Conference on Information Visualisation (IV’06), pages 707–712, July 2006.
  • [5] J. Blaas, C. Botha, and F. Post. Extensions of Parallel Coordinates for Interactive Exploration of Large Multi-Timepoint Data Sets. IEEE Transactions on Visualization and Computer Graphics, 14(6):1436–1451, Nov. 2008.
  • [6] R. P. Boland and J. Urrutia. Polygon Area Problems, 2000.
  • [7] M. Burch and D. Weiskopf. On the benefits and drawbacks of radial diagrams. In Handbook of Human Centric Visualization, pages 429–451. Springer, Jan. 2014.
  • [8] W. Chan. A Survey on Multivariate Data Visualization. Technical report, Department of Computer Science and Engineering. Hong Kong University of Science and Technology, 2006.
  • [9] J. H. Claessen and J. J. Van Wijk. Flexible linked axes for multivariate data visualization. IEEE Transactions on Visualization and Computer Graphics, 17(12):2310–2316, 2011.
  • [10] W. C. Cleveland and M. E. McGill. Dynamic Graphics for Statistics. CRC Press, Inc., 1988.
  • [11] A. Dasgupta and R. Kosara. Pargnostics: Screen-Space Metrics for Parallel Coordinates. IEEE Transactions on Visualization and Computer Graphics, 16(6):1017–1026, Nov. 2010.
  • [12] G. M. Draper, Y. Livnat, and R. F. Riesenfeld. A Survey of Radial Methods for Information Visualization. IEEE Transactions on Visualization and Computer Graphics, 15(5):759–776, Sept. 2009.
  • [13] D. Dua and C. Graff. UCI machine learning repository, 2017.
  • [14] H. Edelsbrunner, J. Harer, and A. K. Patel. Reeb spaces of piecewise linear map**s. In Proceedings of the Twenty-Fourth Annual Symposium on Computational Geometry, SCG ’08, pages 242–250, New York, NY, USA, June 2008. Association for Computing Machinery.
  • [15] G. Falkman. Information visualisation in clinical Odontology: Multidimensional analysis and interactive data exploration. Artificial Intelligence in Medicine, 22(2):133–158, May 2001.
  • [16] E. Fanea, S. Carpendale, and T. Isenberg. An interactive 3D integration of parallel coordinates and star glyphs. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., pages 149–156, Oct. 2005.
  • [17] S. E. Fienberg. Graphical Methods in Statistics. The American Statistician, 33(4):165–178, Nov. 1979.
  • [18] R. A. Fisher. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2):179–188, 1936.
  • [19] M. Forina, S. Lanteri, C. Armanino, et al. Parvus-an extendible package for data exploration, classification and correlation, institute of pharmaceutical and food analysis and technologies, via brigata salerno, 16147 genoa, italy (1988). Av. Loss Av. O set Av. Hit-Rate, 1991.
  • [20] J. Fuchs, F. Fischer, F. Mansmann, E. Bertini, and P. Isenberg. Evaluation of alternative glyph designs for time series data in a small multiple setting. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 3237–3246. Association for Computing Machinery, New York, NY, USA, Apr. 2013.
  • [21] J. Goldberg and J. Helfman. Eye tracking for visualization evaluation: Reading values on linear versus radial graphs. Information Visualization, 10(3):182–195, July 2011.
  • [22] M. Hahsler, K. Hornik, and C. Buchta. Getting Things in Order: An Introduction to the R Package seriation. Journal of Statistical Software, 25(3):1–34, Mar. 2008.
  • [23] P. Hoffman, G. Grinstein, K. Marx, I. Grosse, and E. Stanley. DNA visual and analytic data mining. In Proceedings. Visualization ’97 (Cat. No. 97CB36155), pages 437–441, Oct. 1997.
  • [24] J.-F. Im, M. J. McGuffin, and R. Leung. GPLOM: The Generalized Plot Matrix for Visualizing Multidimensional Multivariate Data. IEEE Transactions on Visualization and Computer Graphics, 19(12):2606–2614, Dec. 2013.
  • [25] A. Inselberg. The plane with parallel coordinates. The Visual Computer, 1(2):69–91, Aug. 1985.
  • [26] T. Isenberg, P. Isenberg, J. Chen, M. Sedlmair, and T. Möller. A Systematic Review on the Practice of Evaluating Visualization. IEEE Transactions on Visualization and Computer Graphics, 19(12):2818–2827, Dec. 2013.
  • [27] P. Jaccard. The Distribution of the Flora in the Alpine Zone.1. New Phytologist, 11(2):37–50, 1912.
  • [28] T. F. Jaeger. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4):434–446, Nov. 2008.
  • [29] J. Johansson, M. Cooper, and M. Jern. 3-dimensional display for clustered multi-relational parallel coordinates. In Ninth International Conference on Information Visualisation (IV’05), pages 188–193, July 2005.
  • [30] J. Johansson, C. Forsell, and M. Cooper. On the usability of three-dimensional display in parallel coordinates: Evaluating the efficiency of identifying two-dimensional relationships. Information Visualization, 13(1):29–41, Jan. 2014.
  • [31] S. Johansson and J. Johansson. Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics. IEEE Transactions on Visualization and Computer Graphics, 15(6):993–1000, Nov. 2009.
  • [32] E. Kandogan. Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions. In In Proceedings of the IEEE Information Visualization Symposium, Late Breaking Hot Topics, pages 9–12, 2000.
  • [33] H. Lam, E. Bertini, P. Isenberg, C. Plaisant, and S. Carpendale. Empirical Studies in Information Visualization: Seven Scenarios. IEEE Transactions on Visualization and Computer Graphics, 18(9):1520–1536, Sept. 2012.
  • [34] L. Lu, W. Wang, and Z. Tan. Double-Arc Parallel Coordinates and its Axes re-Ordering Methods. Mobile Networks and Applications, 25(4):1376–1391, Aug. 2020.
  • [35] L. F. Lu, M. L. Huang, and J. Zhang. Two axes re-ordering methods in parallel coordinates plots. Journal of Visual Languages & Computing, 33:3–12, Apr. 2016.
  • [36] J. E. Nam and K. Mueller. TripAdvisorN̂-D: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail. IEEE Transactions on Visualization and Computer Graphics, 19(2):291–305, Feb. 2013.
  • [37] H. Nguyen and P. Rosen. DSPCP: A Data Scalable Approach for Identifying Relationships in Parallel Coordinates. IEEE Transactions on Visualization and Computer Graphics, 24(3):1301–1315, Mar. 2018.
  • [38] T. Opach, S. Popelka, J. Dolezalova, and J. K. Rød. Star and polyline glyphs in a grid plot and on a map display: Which perform better? Cartography and Geographic Information Science, 45(5):400–419, Sept. 2018.
  • [39] W. Peng, M. Ward, and E. Rundensteiner. Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering. In IEEE Symposium on Information Visualization, pages 89–96, Oct. 2004.
  • [40] R. Rosenbaum, J. Zhi, and B. Hamann. Progressive parallel coordinates. In 2012 IEEE Pacific Visualization Symposium, pages 25–32, Feb. 2012.
  • [41] P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, Nov. 1987.
  • [42] P. W. Shor and C. J. Van Wyk. Detecting and decomposing self-overlap** curves. Computational Geometry, 2(1):31–50, Aug. 1992.
  • [43] M. Streit, R. C. Ecker, K. Österreicher, G. E. Steiner, H. Bischof, C. Bangert, T. Kopp, and R. Rogojanu. 3D parallel coordinate systems—A new data visualization method in the context of microscopy-based multicolor tissue cytometry. Cytometry Part A, 69A(7):601–611, 2006.
  • [44] S. Tabachnikov. Geometry and Billiards. American Mathematical Soc., 2005.
  • [45] C. Tominski, J. Abello, and H. Schumann. Axes-based visualizations with radial layouts. In Proceedings of the 2004 ACM Symposium on Applied Computing, SAC ’04, pages 1242–1247, New York, NY, USA, Mar. 2004. Association for Computing Machinery.
  • [46] C. Viau, M. J. McGuffin, Y. Chiricota, and I. Jurisica. The FlowVizMenu and Parallel Scatterplot Matrix: Hybrid Multidimensional Visualizations for Network Exploration. IEEE Transactions on Visualization and Computer Graphics, 16(6):1100–1108, Nov. 2010.
  • [47] M. Waldner, A. Diehl, D. Gračanin, R. Splechtna, C. Delrieux, and K. Matković. A Comparison of Radial and Linear Charts for Visualizing Daily Patterns. IEEE Transactions on Visualization and Computer Graphics, 26(1):1033–1042, Jan. 2020.
  • [48] B. Wang and K. Mueller. The Subspace Voyager: Exploring High-Dimensional Data along a Continuum of Salient 3D Subspaces. IEEE Transactions on Visualization and Computer Graphics, 24(2):1204–1222, Feb. 2018.
  • [49] M. O. Ward. A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization. Information Visualization, 1(3-4):194–210, Dec. 2002.
  • [50] R. Wegenkittl, H. Loffelmann, and E. Groller. Visualizing the behaviour of higher dimensional dynamical systems. In Proceedings. Visualization ’97 (Cat. No. 97CB36155), pages 119–125, Oct. 1997.
  • [51] S. Wehrend and C. Lewis. A problem-oriented classification of visualization techniques. In Proceedings of the First IEEE Conference on Visualization: Visualization ‘90, pages 139–143, Oct. 1990.
  • [52] L. Zhou and D. Weiskopf. Indexed-Points Parallel Coordinates Visualization of Multivariate Correlations. IEEE Transactions on Visualization and Computer Graphics, 24(6):1997–2010, June 2018.