HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: sparklines

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2403.12343v1 [cs.HC] 19 Mar 2024

Glanceable Data Visualizations for Older Adults: Establishing Thresholds and Examining Disparities Between Age Groups

Zack While [email protected] 0000-0002-9114-3984 University of Massachusetts AmherstAmherstMassachusettsUSA01002 Tanja Blascheck [email protected] 0000-0003-4002-4499 University of StuttgartStuttgartGermany Yujie Gong [email protected] 0009-0003-8629-3472 Smith CollegeNorthamptonMassachusettsUSA01063 Petra Isenberg [email protected] 0000-0002-2948-6417 Université Paris-Saclay, CNRS, Inria, LISNOrsayFrance  and  Ali Sarvghad [email protected] 0000-0003-3718-7043 University of Massachusetts AmherstAmherstMassachusettsUSA01002
(2024)
Abstract.

We present results of a replication study on smartwatch visualizations with adults aged 65 and older. The older adult population is rising globally, coinciding with their increasing interest in using small wearable devices, such as smartwatches, to track and view data. Smartwatches, however, pose challenges to this population: fonts and visualizations are often small and meant to be seen at a glance. How concise design on smartwatches interacts with aging-related changes in perception and cognition, however, is not well understood. We replicate a study that investigated how visualization type and number of data points affect glanceable perception. We observe strong evidence of differences for participants aged 75 and older, sparking interesting questions regarding the study of visualization and older adults. We discuss first steps toward better understanding and supporting an older population of smartwatch wearers and reflect on our experiences working with this population. Supplementary materials are available at https://osf.io/7x4hq/.

glanceable visualization, older adults, mobile visualization
journalyear: 2024copyright: rightsretainedconference: Proceedings of the CHI Conference on Human Factors in Computing Systems; May 11–16, 2024; Honolulu, HI, USAbooktitle: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), May 11–16, 2024, Honolulu, HI, USAdoi: 10.1145/3613904.3642776isbn: 979-8-4007-0330-0/24/05ccs: Human-centered computing Empirical studies in visualizationccs: Human-centered computing Visualization design and evaluation methods

1. Introduction

Glanceable visualizations are concise graphical representations of data primarily designed and used for enabling quick insight discovery at a glance without extensive exploration or analysis (Pizza et al., 2016; Isenberg, 2021; Blascheck et al., 2021). These visualizations are standard on devices with small displays, such as smartwatches and activity trackers. Most interactions with such devices are less-than-5-second peeks aimed at fast retrieval of key information (Pizza et al., 2016). This brevity aligns well with the design goals, characteristics, and affordances of glanceable visualizations. Moreover, the constrained display size of these devices limits the utility of complex and information-dense visualizations (Neshati et al., 2019a), making glanceable formats a suitable fit.

Prior research (e. g., (Blascheck et al., 2018, 2021; Blascheck and Isenberg, 2021; Blascheck et al., 2023)) has investigated various aspects of glanceable visualizations’ design and utility. However, a crucial gap remains in our understanding of how these findings apply to older adults.111In line with the World Health Organization, we use the term “older adults” to refer to individuals aged 65 years and over. This knowledge gap becomes even more critical in light of the fast-growing population of older adults and the rapid emergence of visualization tools and technologies to assist older adults with data-driven self-care and decision-making (Backonja et al., 2016).

As individuals age, they undergo gradual changes in perception, cognition, and physical abilities that can impact their capacity to use visualizations effectively. For example, a recent survey by Fan et al. (Fan et al., 2023) found that older adults struggled to effectively use and gain insights from COVID-19 visualizations due to factors such as indistinguishable colors for aging vision (e. g., blue-green), low contrast ratios between graphical elements, and small font sizes. The onset and progression of age-related changes can vary greatly among individuals due to genetics, lifestyle, and overall health. Nevertheless, the physiological process of aging affects everyone universally. For instance, age-related farsightedness (presbyopia) is nearly ubiquitous in adults 65 years and older (Johnson and Finn, 2017), and many people experience a decline in visual acuity above age 50 (Mitzner et al., 2015). Certain cognitive functions such as memory, attention, and processing speed may also decline with age, which can influence information processing and decision-making (Unsworth and Engle, 2007). The compounded effects of aging may affect older adults’ use of glanceable visualizations on small screens and, therefore, the usability of smartwatches. However, the intersection of glanceable visualization and aging is a notably under-explored area of research. Our understanding of the performance, preferences, and requirements of older adults, as well as the factors that may influence and shape them, remains limited.

This work seeks to narrow our current knowledge gap at the intersection of aging and visualization. We, therefore, conducted a replication of the perceptual study by Blascheck et al. (Blascheck et al., 2018), in which the authors examined and established time thresholds across the combination of three glanceable visualization designs (Bar [Uncaptioned image], Donut [Uncaptioned image], and Radial [Uncaptioned image]) and three data sizes (7, 12, and 24 data points) on a smartwatch. In perceptual studies, the term “threshold” commonly refers to a performance boundary (or limit), such as the minimal time needed for individuals to detect or distinguish information within a visualization. The authors considered a single comparison task, in which participants were required to identify two target elements (e. g., two bars) in the visualization and select the one representing a larger value. In our replication study, we followed the same design and procedure outlined in the original work, except for the age group of the participants. While Blascheck et al.’s participants were predominantly younger individuals (19-64 years old), we specifically recruited participants from an older age group (65 and older) for our study. Our work primarily aimed to (1) examine the potential variations and similarities in performance and preferences between older adults and younger individuals, (2) investigate the extent to which aging affects performance and preference within the “older adult” group, and (3) establish empirical thresholds for the speed at which older adults can execute a simple data comparison task that involves reading a glanceable visualization.

We contrast the results of our study with those of Blascheck et al., allowing for a comprehensive understanding of the distinctions between the two study age groups. In our threshold assessments, older adults exhibited the fastest overall performance with Donut [Uncaptioned image] (312 ms), followed by Bar [Uncaptioned image] (485 ms) and Radial [Uncaptioned image] (2211 ms), a ranking that aligns with Blascheck et al.’s study with younger participants. Despite this similarity, older adults were consistently slower across all nine experimental conditions. Further analysis of confidence intervals’ overlap showed strong (5/9 conditions) to weak (2/9) evidence of differences between younger and older adults. The performance gap widened with increasing data size (7\rightarrow12\rightarrow24) for all visualization types, hinting at a steeper performance decline in older adults as visual complexity grew. To gain a more nuanced understanding of the relationship between the progression of age and glanceable visualization performance, we compared the performance of the younger adults from the prior study by Blascheck et al. (n=18𝑛18n=18italic_n = 18) with two older age segments: “young-old” (age 65-74, n=12𝑛12n=12italic_n = 12) and “old-old” (age 75absent75\geq 75≥ 75, n=12𝑛12n=12italic_n = 12).222Terms young-old and old-old are used in gerontology to distinguish between different segments of the older population (e. g., Baltes and Smith (Baltes and Smith, 2003)). We followed conventional gerontological practice by setting the cut-off age for the old-old category at 75 (e. g., (Abdel-Ghany and Sharpe, 1997; Sinoff and Ore, 1997; Baltes and Smith, 2003)).

We observed minimal performance differences between younger adults and the young-old in 8 out of 9 conditions. However, the performance discrepancies increased with age, with evidence of several differences between the old-old and the two other age groups. These findings suggest that the impact of aging on visualization performance may accelerate and intensify with advanced age.

This work makes several contributions to the visualization field. First, we fill a gap by quantitatively establishing time thresholds for older adults in the context of glanceable visualizations, enriching the current understanding of how age impacts visual information processing. Second, we offer evidence that underscores the influence of age on graphical perception. Our findings demonstrate that age matters and should be incorporated into both research and practices in visualization. Lastly, reflecting on the lessons learned conducting this work, we outline considerations for designing and conducting human-centered studies with older adults. These contributions collectively set the stage for further investigations into the complex interplay between aging and visualization.

2. Motivation for Focusing on Older Adults and Replication

This section addresses three key considerations: (1) The rationale for focusing on older adults, (2) the reasons for conducting a replication study, and (3) the choice of Blascheck et al.’s (Blascheck et al., 2018) study.

2.1. Why Older Adults?

The knowledge gap at the intersection of data visualization with aging is significant and under-researched (Backonja et al., 2016; Le et al., 2016; Brandt et al., 2014). This issue is gaining urgency in light of the growing global population of older adults, which is expected to rise to 16% by 2050 and 24% by 2100; in the U.S., older adults will outnumber children by 2060 (Vespa et al., 2018). The use of visualizations in enhancing older adults’ lives, such as in health monitoring wearables, is on the rise, with most interventions including visualizations (Cajamarca et al., 2020). However, challenges in design and usability hinder adoption among older adults (Chung et al., 2023). Addressing this research area is timely and ethically necessary, responding to calls for more diverse, inclusive, and equitable research in visualization  (Lee et al., 2020; Angerbauer and Sedlmair, 2022; Marriott et al., 2021).

2.2. Why Replication?

Our research aligns with a conceptual replication (Brandt et al., 2014; Crandall and Sherman, 2016; Stroebe and Strack, 2014), a replication study type, in which the same research questions or hypotheses are examined as in the original study but with controlled changes in methods, participants, or settings. Conceptual replication aims to test and extend the generalizability of prior findings. Crandall and Sherman (Crandall and Sherman, 2016) argue for its importance in scientific research by emphasizing its role in testing theories across various settings and conditions, contributing to a broader understanding of the subject. Unlike direct replications, which primarily seek to confirm original findings, conceptual replications challenge and expand these theories. They are more efficient in advancing the field and identifying theories’ limitations or boundary conditions. The ongoing debate in the scientific community about direct versus conceptual replications is noted; for example, Stroebe and Strack (Stroebe and Strack, 2014) suggest that conceptual replications, by testing a theory’s validity across different settings and populations, offer more robust evidence than direct replications. Throughout this paper, replication specifically refers to conceptual replication.

2.3. Why Blascheck et al.’s Study?

Previous studies on smartwatch perception mainly involved young participants (e. g., (Blascheck et al., 2018, 2023; Neshati et al., 2019b)). Given the smaller displays of smartwatches, age-related vision changes, and the growing interest of older adults in smartwatches, it is crucial to determine if these findings apply to them. Our replication of Blascheck et al.’s study (Blascheck et al., 2018) was motivated by four factors: its focus on fundamental aspects of smartwatch perception, the inclusion of basic comparison tasks relevant to many visualization tasks, the use of various visualization types and data sizes, and the prior study’s participant age range (<65absent65<65< 65), enabling direct performance comparisons between older and younger adults. Additionally, its previous replication with young participants on larger displays confirmed the original results’ robustness (Blascheck and Isenberg, 2021). Lastly, access to detailed information from the original researchers facilitated a comprehensive replication.

3. Related Work

While prior work on glanceable visualization and aging is remarkably sparse (Cajamarca et al., 2023), the existing literature provides a starting point that informs and motivates this study. In this section, we discuss relevant current work on glanceable visualization (Section 3.1), visualizations for older adults (Section 3.2), as well as smartwatches and older adults (Section 3.3).

3.1. Glanceable Visualizations

Glanceable visualizations depict information that can be gleaned at a glance, focusing on design choices that communicate data as concisely as possible, although different areas of research present varying definitions of the length of a “glance” (Blascheck et al., 2021). Consolvo et al. (Consolvo et al., 2008) presented one of the earliest works on glanceable displays, demonstrating the positive impacts of a mobile fitness tracker on wearers’ motivation. Blascheck et al. (Blascheck et al., 2023) observed that watch faces with a digital time representation and multiple proportion visualizations lead to better performance with bar and radial bar charts and outperformed text depicting progress toward a target value. Bar charts led to higher accuracy. However, participants preferred radial bar charts concerning aesthetics. Adding distracting elements, such as an analog time representation, did not significantly affect performance. Horak et al. (Horak et al., 2018) investigated the interplay of smartwatches and large interactive displays in data analysis scenarios, using smartwatches to store and show sets of data points or visualization configuration options as well as to control aspects of the large display. However, that work viewed the smartwatch as a supplement to large displays instead of creating visualizations for smaller screens. Islam et al. (Islam et al., 2022) examined the design of sleep visualizations on smartwatches and fitness bands, observing greater preference for visualizations over text depictions and similar chart preferences across both devices.

Neshati et al. (Neshati et al., 2019b) incorporated glanceable sparkline designs on smartwatches. Condensed line graphs along the x-axis improved accuracy in judging line heights, reducing the number of flick operations needed to view entire graphs and maximizing screen real estate. Similar efforts in efficient use of screen space include the simplified designs of Space-Filling Line Graphs (Neshati et al., 2021) as well as the work of Chen (Chen, 2017), which proposed utilizing smartwatch borders for interactive broad views of large time-series data. Our study focuses on older adults as a target population, while these targeted a more general audience and included significantly younger participants.

Understanding the data viewed on smartwatches is crucial, with previous studies exploring common and health-related preferences (Islam et al., 2020; Chung et al., 2023). Islam et al. (Islam et al., 2020) found health and fitness data as the most frequently displayed, with an average of five items on participants’ watch faces at any time. Chung et al. (Chung et al., 2023) noted strong interest in step count, walking distance, and heart rate but lower interest in oxygen levels, stairs climbed, and sleep. Glanceable visualization has been studied in application-focused contexts such as qualitative exercise feedback (Gouveia et al., 2016; Schiewe et al., 2020), while other works observed preferences for visualizing data rather than presenting it as a notification (Schiewe et al., 2020) or a number (Amini et al., 2017). In contrast to this past work, our perceptual study is data- and application-agnostic.

3.2. Visualization and Older Adults

Research on visualization often overlooks older adults, a gap highlighted by Cajamarca et al. (Cajamarca et al., 2020). One visualization area considered for older adults is health data monitoring and decision-making. This field explores the impact of visualization systems on life decisions, particularly for older adults. Galesic et al. (Galesic et al., 2009) found that icon arrays aided risk comprehension in individuals with low numeracy, including older adults. Pham et al. (Pham et al., 2012) observed that health visualizations assisted older adults in progressing toward health goals, while Vargemidis et al. (Vargemidis et al., 2023) reported increased motivation for physical activity in older adults through visualizations that emphasized enjoyment. Additionally, Price et al. (Price et al., 2016) explored how visualization affects working memory and decision-making in older adults. They found that color-based visualizations were more effective than tabular data for medical decision-making, although this advantage decreased in more complex situations. Collectively, these studies underscore the potential of visualization in enhancing the decision-making and well-being of older adults, including applications with smartwatches.

Other work has focused on overall visualization design with older adults in mind. Le et al. (Le et al., 2012) evaluated health visualization designs, including stacked bar charts, wellness polygons, and partitioned donut charts. They found that while bar charts were familiar, they demanded high cognitive load; polygons were complex for trend analysis; and partitioned donut charts, though adequate for trend depiction, also required high mental effort. Subsequent studies (Le et al. (Le et al., 2016, 2014)) showed older adults’ preference for separate line graphs over stacked ones, and a longer time was taken for value comparison than younger adults. Le et al. (Le et al., 2015) further discovered that multiple visual cues in health visualizations could impair older adults’ information processing, with participants using a combination of prior knowledge and visualization elements when interacting with visual displays. Fan et al. (Fan et al., 2023) identified challenges for older adults in understanding interactive COVID-19 data visualizations, such as insufficient information clarity. Additionally, Cajamarca et al. (Cajamarca et al., 2023) observed that older Chilean adults interpreted health data on smartwatches more accurately without progress indicators, which otherwise distracted them from essential information. They also found that older adults with higher technology proficiency had better accuracy and speed in data interpretation. These studies highlight how aging affects visualization perception, a challenge that may be compounded when dealing with small-scale smartwatch displays.

The existing but limited body of work on visualization for older adults highlights critical aspects of design and graphical perception specific to this demographic. However, only one previous work was performed using smartwatches or glanceable visualizations (Cajamarca et al., 2023). This gap in the current literature underscores the pressing need for further exploration to determine whether the established understanding of graphical perception remains applicable in the face of significant shifts in screen size (small) and design objectives (rapid information retrieval).

3.3. Smartwatches and Older Adults

Rosales et al. (Rosales et al., 2017) assessed the evolving attitudes and behaviors of older adults using smartwatches over time. Participants with a preexisting interest in technology showed greater enthusiasm for learning new technology, while others envisioned the smartwatch potentially replacing some smartphone functions. Li et al. (Li et al., 2020) found that prolonged wearable device usage positively impacted older adults’ health. Consistent daily usage formed long-term habits, corresponding with improved health outcomes. These findings highlight the potential of smartwatches to enhance older adults’ lives, encouraging behaviors that boost their well-being and cultivate interest in other technologies.

Khakurel et al. (Khakurel et al., 2018) investigated smartwatch usability issues for older adults, identifying persistent challenges such as screen size, typography, tap detection, and interaction techniques. Notably, screen size and font size emerged as critical factors influencing smartwatch usage. Chung et al. (Chung et al., 2023) found positive attitudes among older adults toward wearable devices for activity tracking but highlighted potential barriers like screen design, size, and complexity. Cristescu et al. (Cristescu et al., 2022) proposed a model exploring factors sha** older adults’ behavioral intentions towards wearable technology, observing that design aesthetics, performance expectancy, effort expectancy, and facilitating conditions played significant roles. This underscores the impact of external support on older adults’ behavioral intentions. Collectively, these works emphasize the need for nuanced design considerations to enhance smartwatch usability and appeal for older wearers, motivating our replication study as a foundational step toward more usable smartwatch visualizations.

4. Study Design and Execution

This study is a conceptual replication of the Evaluating Random Differences study by Blascheck et al. (Blascheck et al., 2018) detailed in Section 6 of the original paper, with the notable difference that all participants in the current study were aged 65 or older. Unless otherwise stated, all details regarding the study design and execution were replicated according to the original research to the best of our ability.

4.1. Participants

After preregistration, approval of the study by our institution’s IRB, and requesting permission from local older adult community groups, we posted a PDF advertisement flyer to several email lists and a few in-person advertisement boards. We recruited 24 participants (19 female, 4 male, and one preferred not to answer), with ages ranging from 65 to 96 years old (Avg. = 73, SD = 8). Participants’ education levels were high school diploma or less, associate’s degree, bachelor’s degree, master’s degree, and doctorate, with almost all (22) participants possessing at least a bachelor’s degree. All participants had normal or corrected-to-normal vision (self-reported), only one participant did not wear glasses, and no participant had a color vision deficiency (self-reported).

Among the 24 participants, 13 reported having no prior familiarity with visualizations, and their years of experience ranged from 0 to 70 (Avg. = 18, SD = 26). Participants were asked for their familiarity with each chart on a Likert scale (Likert, 1932) from 1 (not at all) to 5 (very familiar), reporting on Bar [Uncaptioned image] (Avg. = 4.1, SD = 1.1), Donut [Uncaptioned image] (Avg. = 3.2, SD = 1.6), and Radial [Uncaptioned image] (Avg. = 1.5, SD = 1.0). New for this study to investigate its possible effect on performance, participants were asked to rate their tech-savviness on a Likert scale from 1 (strongly disagree to consider myself a tech-savvy person) to 5 (strongly agree)  (Avg. = 2.6, SD = 1.1). All participants finished the study, which typically took between 45 to 75 minutes. Participants then received a $25 Amazon gift card. Comparisons of this study’s participant demographic information to those of the original research are provided in Table 1.

Table 1. (a) Comparison of age, visualization experience (in years), and familiarity (1-5) with Bar [Uncaptioned image], Donut [Uncaptioned image], and Radial [Uncaptioned image] charts between younger (age <65absent65<65< 65) and older adults (age 65absent65\geq 65≥ 65). (b) Comparison of education levels between younger and older adults.
Age Vis Exp. Bar [Uncaptioned image] Famil. Donut [Uncaptioned image] Famil. Radial [Uncaptioned image] Famil.
Group Mean SD Mean SD Mean SD Mean SD Mean SD
Younger 35.0 13.0 4.5 1.6 4.6 1.0 4.3 1.1 2.3 1.7
Older 73.3 7.8 17.9 26.1 4.3 1.1 3.3 1.6 1.6 1.0
(a)
Group High School Associate’s Bachelor’s Master’s PhD
Younger 18% 0% 12% 71% 0%
Older 4% 4% 29% 38% 25%
(b)

4.2. Apparatus

We used a Sony SmartWatch 3 to enable comparison with the original study. The watch’s screen resolution is 320×320320320320\times 320320 × 320 pixels, measuring 1.6 inches on each side (Sony, ). We reconstructed the watch stand (shown in Figure 1) using the original authors’ designs, approximating a typical viewing angle. As per the original study design (Blascheck et al., 2018), the stand was placed 28 cm horizontally from the end of the table and 20 cm vertically from the table’s surface. After observing some initial difficulty with seeing the watch in the pilot studies, we allowed participants to horizontally move the stand as needed (Avg. = 25.2, SD = 4.9, Range = 10.2-28) but kept the vertical height consistent.

(a)
Refer to caption

A picture of the study setup. A participant sits at a table with the watch stand, fingers ready to press the two separate buttons (left and right). A microphone sits nearby, facing the participant.

(a)
(b)
Refer to caption

An image of the study setup from the participants perspective. The pre-study questionnaire sits on a clipboard, and the watch is visible on the watch stand facing the participant (camera). A sign saying ”Please remember to think aloud” is visible behind the stand.

(b)
(c)
Refer to caption

On the top portion, the mechanical keyboard used in the study is shown. A red box is around the four arrow keys in the standard of the bottom-right portion of the keyboard. On the bottom are two separate buttons, each with its USB cable. They look like individual keys one would have on a mechanical keyboard. The keycap on the left button has the left arrow symbol printed on it, and the right button’s keycap has the right arrow printed on it.

(c)
Figure 1. Study setup and apparatus: (a) An example participant during the study. (b) A participant’s point of view. (c) The keyboard used at the start of the study, arrow keys emphasized with a red  outline (top) and the two separate buttons (bottom) added after P6.

4.3. Task and Stimuli

This study followed a two-alternative forced choice (2AFC) approach (Macmillan and Creelman, 2004), in which participants were presented with a visualization (stimulus) comprised of one of three possible chart types (Bar [Uncaptioned image], Donut [Uncaptioned image], or Radial [Uncaptioned image]) showing three possible data sizes (7, 12, or 24). On the stimulus, two black dots indicated two specific elements (e. g., two bars of a bar chart). After the visualization was shown, we asked participants to choose which dot (left or right) marked the larger-valued element. This meant the taller bar in Bar [Uncaptioned image], the larger region in Donut [Uncaptioned image], or the more complete circle in Radial [Uncaptioned image]. The original study’s authors provided us with the same software and set of stimuli as in the original study (Blascheck et al., 2018). A total of 396 images were provided, with half (198) having a larger value on the left and half having a larger value on the right.

As per the original study, the stimulus display time was adjusted in real-time as a function of the participant’s response using a weighted 3-down, 1-up staircase procedure that increased exposure time by 100 ms (Δ+superscriptΔ\Delta^{+}roman_Δ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT) after an incorrect response and decreased it by 300 ms (ΔsuperscriptΔ\Delta^{-}roman_Δ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT) after three consecutive correct responses. This results in an expected accuracy of approximately 63%percent6363\%63 % (García-Pérez, 1998), though it is of note that the original study incorrectly calculated this value as approximately 91%percent9191\%91 % (Blascheck et al., 2018). For each of the nine conditions (3 chart types ×\times× 3 data sizes), participants completed a set of trials, which we refer to as a staircase. Figure 2 shows an example staircase from our study. A staircase would continue until the 15th reversal (i. e., change from increasing to decreasing exposure time or vice-versa) or the end of the stimuli (the 198th trial) was reached, as is typical for staircase procedures (García-Pérez, 1998).

Figure 2. An example of the staircase method, showing P24 performance in the Donut Refer to caption 12 condition. Each circle or triangle glyph is a trial the participant performed. The color orange-red marks errors while triangle shapes mark reversals, i. e., trials when a participant switched from up (most recently increased exposure time) on the staircase to down (decreasing exposure time), or vice-versa. Reversal points R1-R15 are labeled. To compute the time threshold, the time-per-stimulus for the reversal points is averaged, starting with the third reversal (R3). The computed threshold in this example is 238 milliseconds. The minimum exposure time was 100 milliseconds.
Refer to caption

Line chart representing the staircase method for one participant, with exposure time (ms) on the Y axis and trial number on the X axis. Time ranges from 0 to 2000 ms, while trial number ranges from 0 to 55. The first trial starts at nearly 2000 ms, drop** to around 238 ms for the first reversal. After that, the line fluctuates from increasing until the subsequent reversal, then decreasing, oscillating around a line Y = 238, which is the calculated threshold.

Figure 2. An example of the staircase method, showing P24 performance in the Donut Refer to caption 12 condition. Each circle or triangle glyph is a trial the participant performed. The color orange-red marks errors while triangle shapes mark reversals, i. e., trials when a participant switched from up (most recently increased exposure time) on the staircase to down (decreasing exposure time), or vice-versa. Reversal points R1-R15 are labeled. To compute the time threshold, the time-per-stimulus for the reversal points is averaged, starting with the third reversal (R3). The computed threshold in this example is 238 milliseconds. The minimum exposure time was 100 milliseconds.

4.4. Pilot Studies

Before the main study, we conducted four pilot studies involving volunteer older adults (ages 61-69). These pilots aimed to assess the study design and procedure, which were not tailored to older adults. During the pilots, we examined the visibility of dot markers on stimuli, the study duration, potential fatigue, eye strain, comprehensibility of the required comparison task and visualizations, and the placement and distance of the smartwatch stand.

We observed some pilot study participants encountering difficulties with Radial [Uncaptioned image], mistakenly assuming they were measuring bar length rather than circle completeness. To address this, we included additional information and examples on interpreting these charts in the pre-study explanation of stimuli and tasks.

4.5. Study Procedure

Figure 3. Overall structure of the study. Each participant started and ended their study with a questionnaire. In this example, each chart type presents 7 data items for a staircase, then 12, and then 24; for each participant, this ordering was based on the Latin square design. Each staircase comprises 10 practice trials, and then the staircase, with the number of trials in the staircase depending on performance. After each chart type, we asked participants to describe strategies they used for that set.
Refer to caption

Flowchart showing the set of steps in the study for a single participant. First is the pre-study questionnaire, then practice trials and staircases for the Bar 7 condition, Bar 12 condition, and then Bar 24 condition; this ordering of data sizes is just an example here, with the actual ordering chosen by the Latin Square design. This set of three staircases and a strategy question are repeated for the Donut and Radial charts. The last step is the post-study questionnaire. Example study stimuli for each condition (chart and data size combination) are also depicted.

Figure 3. Overall structure of the study. Each participant started and ended their study with a questionnaire. In this example, each chart type presents 7 data items for a staircase, then 12, and then 24; for each participant, this ordering was based on the Latin square design. Each staircase comprises 10 practice trials, and then the staircase, with the number of trials in the staircase depending on performance. After each chart type, we asked participants to describe strategies they used for that set.

The overall study procedure is detailed in Figure 3. Participants would fill out the pre-study questionnaire upon confirming their consent to participate. Then, an investigator would explain the study format, including how to read and compare values on each chart using a printed set of examples. We also told participants they could take short breaks anytime, especially between conditions.

Participants were assigned a participant ID number based on the order of appointments, which would inform the specific ordering of the chart types and data sizes; orderings were counterbalanced by a reduced Latin square design (Bradley, 1958). Because our study (n=24𝑛24n=24italic_n = 24) had more total participants than the original study’s Latin square design accounted for (n=18𝑛18n=18italic_n = 18), there were some repeats in type-size orderings. To best allow comparison between the subgroups of young-old participants (age 65-7465-7465\text{-}7465 - 74, n=12𝑛12n=12italic_n = 12) and old-old participants (age 75absent75\geq 75≥ 75, n=12𝑛12n=12italic_n = 12), we made sure that there were no repeats of type-size orderings within either subgroup.

A condition would begin with ten practice trials, with the initial exposure time between 1700 ms and 5100 ms depending on the condition; for comparability, we use the exact starting times as the previous study (Blascheck et al., 2018). The individual trial procedure is shown in Figure 4. A trial would begin with viewing the visualization, after which a set of four intervening images would show on the screen for 20 ms each to prevent afterimages of the stimulus. Then, the screen would prompt the participant to give their input (left or right), and lastly, the participant would be shown if their answer was correct or not for 1000 ms. After completing three staircases for one chart type, we would ask the participant what strategies (if any) they used while completing the tasks for these charts.

Figure 4. Steps involved in an individual trial. A participant is first shown the stimulus (a chart) for an amount of time, which the current staircase determined. Then, they are shown four intervening images for 20 ms each. Next, the screen prompts the participant to either press the left or right arrow key, after which the participant is told whether their input was correct or not.
Refer to caption

Flowchart depicting the set of events in an individual trial. First, the user is shown a stimulus (chart) for an amount of time depending on the staircase, then four intervening images (total of 80 ms), then the software waits for the user to press the left or right arrow, and then the screen showing lets the user know if they were correct or not, displaying the word “correct” or “error”.

Figure 4. Steps involved in an individual trial. A participant is first shown the stimulus (a chart) for an amount of time, which the current staircase determined. Then, they are shown four intervening images for 20 ms each. Next, the screen prompts the participant to either press the left or right arrow key, after which the participant is told whether their input was correct or not.

Study Design Modifications

During the studies, we made two minor adjustments to the design. The first change occurred after the second study when we noticed that participants (P1 and P2) expressed concern with the difficulty of performing the Radial [Uncaptioned image] 24 condition. As a result, we decided to permit subsequent participants to skip this condition if they found it infeasible. Seven of the 22 remaining participants quit the Radial [Uncaptioned image] 24 condition after attempting a few trials (refer to Figure 5 for more details). Additionally, during the initial sessions of the main study, some participants expressed discomfort and difficulty with pressing the arrow keys on the full-sized keyboard used for recording their responses. To address this issue, we replaced the keyboard with two separate mechanical keys—one on the left and one on the right (refer to 0(c)). This modification seemed to improve the ergonomic experience for the participants without changing the study format.

Figure 5. High-level depiction of the study process and progression. We started with four pilot studies and proceeded to the main study with 24 participants. Each box in the graph represents a participant and their age. Colors indicate if that participant was in the young-old  (age 65-74) or old-old  (age 75absent75\geq 75≥ 75) group. After the second study (P2), participants were explicitly told at the start that they could skip the Radial Refer to caption 24 condition if they were experiencing excessive fatigue, discomfort, or difficulty. Seven of the 22 remaining participants (marked with a Refer to caption) chose to skip that condition. After the sixth study (P6), we swapped the keyboard with two separate physical keys for greater ergonomic comfort during the study (shown in 0(c)).
Refer to caption

Visualization of the overall study progression regarding users and their ages. The four pilot studies are first (all young-old adults), and then the 24 participants of the main study are shown (a mix of young-old and old-old participants). After the second participant (P2), participants were told they could skip the Radial 24 condition. After the sixth participant (P6), the full keyboard was replaced by two separate physical keys. Participants P8, P9, P10, P16, P17, P21, and P23 chose to skip the Radial 24 condition.

Figure 5. High-level depiction of the study process and progression. We started with four pilot studies and proceeded to the main study with 24 participants. Each box in the graph represents a participant and their age. Colors indicate if that participant was in the young-old  (age 65-74) or old-old  (age 75absent75\geq 75≥ 75) group. After the second study (P2), participants were explicitly told at the start that they could skip the Radial Refer to caption 24 condition if they were experiencing excessive fatigue, discomfort, or difficulty. Seven of the 22 remaining participants (marked with a Refer to caption) chose to skip that condition. After the sixth study (P6), we swapped the keyboard with two separate physical keys for greater ergonomic comfort during the study (shown in 0(c)).

Notably, none of these issues emerged during our pilots, highlighting the known heterogeneity of the older adult population. The manifestation and intensity of age-related changes can vary widely among individuals. The discrepancies between the pilots and the main study underscore the importance of adapting the study design with real-time feedback and participant needs, ensuring a more comfortable and meaningful experience for older participants.

4.6. Data Collection

We collected participants’ responses to the pre-study questionnaire, in which we asked for information about their age, gender, education level, primary occupation, dominant hand, vision deficits, vision corrections, and color vision deficiency. We also collected information about their level of familiarity with visualization, familiarity with Bar [Uncaptioned image], Donut [Uncaptioned image], and Radial [Uncaptioned image] chart, years of experience with visualization, and their self-reported tech-savviness (see supplementary material for complete questionnaire).

For each trial within a staircase, we collected the following data: exposure time in milliseconds, chart type and data size of the stimulus, and the correctness of the participant’s answer to the comparison task. We also asked participants to “think aloud,” recording audio of and transcribing their thoughts and reactions during the study. Between sets of three staircases (i. e., three data sizes for a chart), participants were asked what strategy (if any) they used to perform the comparison task.

After completing the final staircase, each participant answered a post-study questionnaire. Participants were prompted to rank the three types of charts on a scale from 1 (best) to 3 (worst) based on their subjective preferences and confidence in accurately interpreting the data. We also inquired about their smartwatch ownership. For smartwatch owners, we asked about the watch’s make, what they use it for, how often they monitor their data, and any visualizations they encounter when using it. We further explored the data types they actively monitor, additional data they would like to track, and their preference for visualizations over numerical displays. For those without a smartwatch, we gauged their interest in owning one and inquired about any barriers preventing them. Finally, all participants were shown a list of commonly tracked smartwatch data types identified by Islam et al. (Islam et al., 2020) and asked to indicate which they would be interested in monitoring (see supplementary material for complete questionnaire).

5. Data Analysis and Results

This section is organized into five subsections: Time Thresholds (Section 5.1), Accuracy (Section 5.2), Strategies (Section 5.3), Preference and Confidence Ratings (Section 5.4), and Smartwatch Ownership and Data of Interest (Section 5.5).

5.1. Time Thresholds

To gain a comprehensive understanding of the relationship between chronological age and time thresholds, we divided our analysis into three parts: (1) a between-groups comparison of means between younger adults (original study (Blascheck et al., 2018)) and older adults (current study), (2) a between-groups comparison of means between young-old (age 65-74) and old-old (age 75absent75\geq 75≥ 75) adults from within our older adults group, and (3) the overall performance of older adults.

As per the original study, we created 95% confidence intervals (CIs) of participant thresholds using BCa bootstrap** (Efron, 1987) per condition, using Bonferroni correction when making multiple comparisons (i. e., 5(b)). All the information regarding the younger adults’ performance and preference we used for analysis in this section was initially collected and reported by Blascheck et al. (Blascheck et al., 2018).

To compare independent groups, younger vs. older adults as well as young-old vs. old-old, we assessed the strength of their time threshold differences for a specific condition by examining the overlap of their confidence intervals (CIs). In line with Blascheck et al. (Blascheck et al., 2018), we quantified this overlap with a metric we refer to as interval overlap percentage (IOP), calculating and interpreting IOP following the guidelines provided by Cumming (Cumming, 2014) as well as Besançon and Dragicevic (Besançon and Dragicevic, 2017); further details on this calculation can be found in the supplementary materials.

5.1.1. Older Adults Combined

This section delves into an in-depth analysis of the performance of older adults. In 5(a), the average time thresholds per visualization type (left) and their pair-wise comparisons (right) are depicted. Older adults exhibited the best overall performance using Donut [Uncaptioned image], with Bar [Uncaptioned image] trailing closely behind. However, the performance gap notably widens for the Radial [Uncaptioned image]. For pair-wise comparisons (5(a), right), the difference in performance between Donut [Uncaptioned image] and Bar [Uncaptioned image] is considerably smaller than the difference between Donut [Uncaptioned image] and Radial [Uncaptioned image] or Bar [Uncaptioned image] and Radial [Uncaptioned image]. These findings align well with the observations from the original study by Blascheck et al. (Blascheck et al., 2018). Despite the noticeable variances in time thresholds, the overarching trend suggests that Donut [Uncaptioned image] is most apt for facilitating quick comparison tasks across both demographics. 5(b) (left) shows the breakdown of the older adults’ performance by the nine experimental conditions, and 5(b) (right) shows their pair-wise comparisons. Consistent with our following results, performance declined as the visualization complexity grew in conjunction with data size, with performance most notably affected for Radial [Uncaptioned image]. We also evaluated the impact of demographic factors (education level, technological familiarity, overall visualization familiarity, individual visualization familiarity, and smartwatch ownership) on participants’ performance, finding no consistent patterns of differences based on our study population; more details can be found in the supplementary materials.

5.1.2. Younger vs. Older Adults

Figure 7 shows the average time threshold, CIs, and IOPs for younger and older adults, organized by the nine experimental conditions. At a high level, we observed that the average time thresholds for both younger and older adults increased in unison with data size (7 \rightarrow 12 \rightarrow 24) across all conditions. Younger adults seemed to perform consistently better than older adults across all conditions. We used the confidence interval overlap method to assess the magnitude of the observed differences between the two groups. The results showed strong evidence of differences for Bar [Uncaptioned image] across all data sizes and Radial [Uncaptioned image] for 12 and 24 data points. We also found weak evidence of differences for Donut [Uncaptioned image] 12 and Radial [Uncaptioned image] 7. There was insufficient evidence of a difference for Donut [Uncaptioned image] 7 and Donut [Uncaptioned image] 24.

(a)
Refer to caption

Set of visualizations of confidence intervals for mean threshold times across all data sizes and older adult participants; tabular version provided in supplemental materials. Confidence intervals for each of the three chart types are on the left, and pairwise differences between all three chart types are on the right (with added thinner bars representing Bonferonni correction). The X-axis ranges from 0 to 3000 ms for both left and right, with chart confidence intervals stacked vertically. On the left, the mean for Radial charts is noticeably higher than the other two chart types, while Bar and Donut charts are close to each other. Also, the confidence interval width for Radial is larger than the other two chart types. On the right, the pairwise difference between Bar and Donut is much smaller, with the other two pairwise differences close to each other. Bar and Donut’s interval width is much smaller than the other differences’ intervals.

(a)
(b)
Refer to caption

Set of confidence interval visualizations for mean threshold times for each of the data sizes across all older adult participants; tabular version provided in supplemental materials. Confidence intervals for each of the three chart types are on the left, and pairwise differences between all three chart types are on the right (with added thinner bars representing Bonferonni correction). The X-axis ranges from 0 to 6000 ms for both left and right, with chart confidence intervals stacked vertically. On the left, mean thresholds for the Radial condition are the highest for each data size, with the largest distance between chart types occurring for 24 data points. The confidence interval width for the Radial conditions is larger than the other two for all three data sizes. On the right, the pairwise difference between Radial and Donut is the lowest, with the largest distance between pairwise differences occurring for 24 data points. The confidence interval width Radial and Donut is much smaller than the other differences’ intervals for all three data sizes.

(b)
Figure 6. (a-left) Average (Refer to caption) time thresholds of older adults’ (age 65absent65\geq 65≥ 65) performance per visualization type. (a-right) The pair-wise difference between the three visualization types. (b-left) Breakdown of older adults’ time thresholds by visualization type and data size; original CIs are thicker lines, and Bonferroni corrections are thinner. (b-right) Pair-wise performance comparisons, articulated by visualization type and data size.
Figure 7. Mean time thresholds for younger  (age <65absent65<65< 65) and older  (age 65absent65\geq 65≥ 65) participants. Refer to caption shows the mean time thresholds. Note that, for Radial Refer to caption 24, only 17 of the older adults could complete the task. Additionally, Radial Refer to caption 24 ranges from 00 to 6000600060006000, whereas all other charts range from 00 to 2000200020002000.
Refer to caption

Set of confidence interval visualizations for mean threshold times of each condition (chart and data size) for both younger adults (previous study) and older adults (current study); tabular version provided in supplemental materials. For all conditions except Radial 24, the X-axis ranges from 0 to 2000 ms. The older adults’ interval has a higher mean for each condition than younger adults. A thicker black line separates the Radial 24 condition since not all older adult participants completed that condition, and the X-axis ranges from 0 to 6000 ms.

Figure 7. Mean time thresholds for younger  (age <65absent65<65< 65) and older  (age 65absent65\geq 65≥ 65) participants. Refer to caption shows the mean time thresholds. Note that, for Radial Refer to caption 24, only 17 of the older adults could complete the task. Additionally, Radial Refer to caption 24 ranges from 00 to 6000600060006000, whereas all other charts range from 00 to 2000200020002000.
Figure 8. Mean time thresholds for young-old  (age 65656565-74747474) participants and old-old  (age 75absent75\geq 75≥ 75) participants. Refer to caption shows the mean time thresholds. Note that, for Radial Refer to caption 24, only 10 young-old and 7 old-old participants completed the task. Additionally, Radial Refer to caption 24 ranges from 00 to 8000800080008000, whereas all other charts range from 00 to 2000200020002000.
Refer to caption

Set of confidence interval visualizations for mean threshold times of each condition (chart and data size) for both young-old (65 to 74) and old-old (75 plus) adults in the current study; tabular version provided in supplemental materials. For all conditions except Radial 24, the X-axis ranges from 0 to 2000 ms. For each condition, the old-old participants’ confidence interval has a higher mean than younger adults. A thicker black line separates the Radial 24 condition since not all older adult participants completed that condition, and the X-axis ranges from 0 to 8000 ms.

Figure 8. Mean time thresholds for young-old  (age 65656565-74747474) participants and old-old  (age 75absent75\geq 75≥ 75) participants. Refer to caption shows the mean time thresholds. Note that, for Radial Refer to caption 24, only 10 young-old and 7 old-old participants completed the task. Additionally, Radial Refer to caption 24 ranges from 00 to 8000800080008000, whereas all other charts range from 00 to 2000200020002000.

5.1.3. Young-Old vs. Old-Old

The comparative analysis between younger and older adults showed a variety of disparities in their performance levels. Subsequently, we narrowed our focus to older adults, aiming to discern whether the shifts in performance would amplify with advancing age. To this end, we compared the performance of the young-old (65-74) and old-old (age 75absent75\geq 75≥ 75) participants. Figure 8 presents the average time thresholds, CIs, and IOPs for the two groups organized by the nine conditions. We found strong evidence of performance differences between the two groups for Bar [Uncaptioned image] across all data sizes, Donut [Uncaptioned image] 24, and Radial [Uncaptioned image] 24. There was also weak evidence of differences in the Donut [Uncaptioned image] for 7 and 12 data points, and insufficient evidence was found for differences in the Radial [Uncaptioned image] for 7 and 12 data points.

These findings suggest a positive relationship between the decline in performance and advancing age. Hence, we next compared the performance of young-old and old-old participants against the younger adults, as illustrated in Figure 9. While younger adults outperformed young-old adults, the performance discrepancies seemed relatively minor. In contrast, the performance gap between younger adults and the old-old (age 75absent75\geq 75≥ 75) was notably wider. We calculated and analyzed IOP among the three groups based on these observations. Table 2 shows the results of this analysis. We did not find strong evidence of a difference between the younger adults and the young-old. We only found weak evidence for Radial [Uncaptioned image] 24 and insufficient evidence for the eight remaining conditions. On the contrary, the evidence of differences between younger adults and the old-old was consistently strong apart from the differences for Radial [Uncaptioned image] 7 (weak) and Donut [Uncaptioned image] 24 (insufficient). The overall results of these analyses suggest that by and large, time thresholds appear to increase with age. However, the relationship between the two variables might be non-linear, and the rate of performance decline may accelerate with advancing age.

Figure 9. Mean time thresholds for younger  (age <65absent65<65< 65), young-old  (age 65656565-74747474), and old-old  (age 75absent75\geq 75≥ 75) adults. The discrepancies in younger and young-old are much less pronounced than the younger and the old-old. While age impacts time performance, the effects become more substantial as age advances.
Refer to caption

Set of confidence interval visualizations for mean threshold times of each condition (chart and data size), for younger adults from the previous study as well as young-old (65 to 74) and old-old (75 plus) adults in the current study; tabular version provided in supplemental materials. For all conditions except Radial 24, the X axis ranges from 0 to 2000 ms, while it ranges from 0 to 8000 for the Radial 24 condition. Overall, young adults and young-old adults are much closer to each other (similar means, lower values), while old-old adults have higher values. This pattern is less apparent for the three Radial conditions.

Figure 9. Mean time thresholds for younger  (age <65absent65<65< 65), young-old  (age 65656565-74747474), and old-old  (age 75absent75\geq 75≥ 75) adults. The discrepancies in younger and young-old are much less pronounced than the younger and the old-old. While age impacts time performance, the effects become more substantial as age advances.
Table 2. The results of confidence interval overlap percentage (IOP) assessments between the younger adults (age <65)<65)< 65 ) and the young-old (age 65-74) (left) as well as the younger adults and the old-old (age 75absent75\geq 75≥ 75) (right). The impact of aging on graphical perception appears to increase with the advancing of age.
[Uncaptioned image]
Table 3. Average accuracy for young-old (age 65-74) and old-old (age 75absent75\geq 75≥ 75) participants for each condition (chart type and data size).
Bar [Uncaptioned image] Donut [Uncaptioned image] Radial [Uncaptioned image]
Data Size Young-Old Old-Old Young-Old Old-Old Young-Old Old-Old
7 77% 75% 79% 77% 68% 68%
12 71% 72% 77% 74% 64% 63%
24 69% 68% 74% 70% 63% 56%

5.2. Accuracy

We assessed participants’ performance by examining the accuracy of their responses, aiming to determine whether observed accuracy aligned closely with the anticipated accuracy of similar-to\sim63%, as specified by the staircase design (refer to Section 4.3 for details). To achieve this, we calculated the average accuracy for each participant across each of the nine staircases they completed, dividing the number of correct answers by the total trials per staircase. Table 3 provides this analysis, segmented by the young-old and the old-old cohorts. Our study revealed that, except for old-old participants in the Radial [Uncaptioned image] 24 condition, both groups’ accuracy matched or exceeded the expected value of similar-to\sim63%.

5.3. Strategies

After finishing three consecutive staircases for the same chart type (7, 12, and 24 data points), we asked participants to describe the strategy they used for the task. This section describes all strategies per chart that more than one participant reported. It is important to note that some participants mentioned multiple strategies.

5.3.1. Bar [Uncaptioned image]

One strategy (8/24) revolved around identifying only one target bar and estimating based on that bar’s size, for example, if the bar was short, then concluding that the other is likely larger. Another strategy (7/24) involved identifying both target bars and comparing their heights. Participants also mentioned a third strategy, in which they answered based on the overall shape of the data distribution (6/24) or the shapes of local distributions around the target bars (5/24).

5.3.2. Donut [Uncaptioned image]

Many participants (13/24) described a strategy of taking in a single view of the entire chart or focusing on the center of the chart, either approximating the distribution or quickly looking peripherally for the larger patch of color with a dot in it; this was mentioned to work well except in cases where the target elements were similar in size or small. Identical to Bar [Uncaptioned image], some participants (8/24) would focus on only one element and would estimate based on its size. Other participants (6/24) would try to find both dots and compare the regions if given enough time, which was mentioned to be especially helpful for small and similarly-sized regions.

5.3.3. Radial [Uncaptioned image]

Participants most commonly (12/24) reported giving answers based on only focusing on the inner target bar and estimating its value compared to the other target. Some participants (4/24) compared clusters of bars that included the targets, while others (2/24) tried to check each dot and follow its arc.

5.4. Preference and Confidence Ratings

For each data size, participants ranked the three chart types in terms of their subjective preferences and confidence in accurately interpreting the data. Table 4 shows the responses of younger (prior study) and older adults normalized to percentages of their respective sample sizes. We also break down the older adults group into the young-old and old-old subsets, presenting their preference and confidence ratings in Table 5.

Across all data sizes, older adults had the greatest preference for and confidence with Donut [Uncaptioned image], followed by Bar [Uncaptioned image] and Radial [Uncaptioned image]. While the percentages of older adults who ranked Bar [Uncaptioned image] and Donut [Uncaptioned image] the highest were similar for data size 7 (25 pp difference for preference, 16 pp for confidence), the differences increased with the data size, resulting in a more than 50 pp discrepancy for 12 and 24.

For preference, older adults ranked Donut [Uncaptioned image] the highest across all data sizes, while younger adults ranked Bar [Uncaptioned image] the highest for data size 7 and Donut [Uncaptioned image] for data sizes 12 and 24. For Bar [Uncaptioned image] 12, 66 pp more of older adults rated Donut [Uncaptioned image] as the highest compared to Bar [Uncaptioned image] with a difference of only 12 pp for younger adults. The Rank 1 differences for data size 24 were slightly closer, with a 66 pp difference between Donut [Uncaptioned image] and Bar [Uncaptioned image] for older adults and 34 pp for younger adults. For confidence, older and younger adults had similar results (Donut [Uncaptioned image] >>> Bar [Uncaptioned image] >>> Radial [Uncaptioned image]) for all data sizes.

The results for the young-old and old-old are similar for preference and confidence. The largest difference for preference is for Rank 1 of Bar [Uncaptioned image] with 12 data points (25 pp) and 24 data points (25 pp). In contrast, the greatest difference in confidence is 25 pp for both Rank 1 and Rank 2 of Bar [Uncaptioned image] 12 and Donut [Uncaptioned image] 12.

Table 4. Preference and confidence ratings for younger (age <65absent65<65< 65) (Y) and older (age 65absent65\geq 65≥ 65) (O) adults, per chart and data size (DS). Because each sample had a different size (nY=18,nO=24formulae-sequencesubscript𝑛𝑌18subscript𝑛𝑂24n_{Y}=18,n_{O}=24italic_n start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = 18 , italic_n start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT = 24), percentages of each sample are instead presented.
Preference
Bar [Uncaptioned image] Donut [Uncaptioned image] Radial [Uncaptioned image]
DS Rank % Y % O % Y % O % Y % O
7 1

 56

 38

 44

 63

0

0

2

 33

 58

 56

 38

 11

 4

3

 11

 4

0

0

 89

 96

12 1

 44

 13

 56

 79

0

 8

2

 44

 79

 44

 21

 11

0

3

 11

 8

0

0

 89

 92

24 1

 33

 13

 67

 79

0

 8

2

 61

 79

 33

 21

 6

0

3

 6

 8

0

0

 94

 92

Confidence
Bar [Uncaptioned image] Donut [Uncaptioned image] Radial [Uncaptioned image]
DS Rank % Y % O % Y % O % Y % O
7 1

 39

 42

 61

 58

0

0

2

 56

 58

 39

 42

 6

0

3

 6

0

0

0

 94

 100

12 1

 22

 13

 78

 88

0

0

2

 72

 88

 22

 13

 11

0

3

 6

0

0

0

 89

 100

24 1

 11

 17

 89

 83

0

0

2

 78

 79

 11

 17

 11

 4

3

 6

 4

0

0

 89

 96

Table 5. Preference and confidence ratings for young-old (age 65-74) (YO) and old-old (age 75absent75\geq 75≥ 75) (OO) adults, per chart and data size (DS).
Preference
Bar [Uncaptioned image] Donut [Uncaptioned image] Radial [Uncaptioned image]
DS Rank % YO % OO % YO % OO % YO % OO
7 1

 33

 42

 67

 58

0

0

2

 58

 58

 33

 42

 8

0

3

 8

0

0

0

 92

 100

12 1

0

 25

 83

 75

 17

0

2

 83

 75

 17

 25

0

0

3

 17

0

0

0

 83

 100

24 1

0

 25

 83

 75

 17

0

2

 83

 75

 17

 25

0

0

3

 17

0

0

0

 83

 100

Confidence
Bar [Uncaptioned image] Donut [Uncaptioned image] Radial [Uncaptioned image]
DS Rank % YO % OO % YO % OO % YO % OO
7 1

 33

 50

 67

 50

0

0

2

 67

 50

 33

 50

0

0

3

0

0

0

0

 100

 100

12 1

0

 25

 100

 75

0

0

2

 100

 75

0

 25

0

0

3

0

0

0

0

 100

 100

24 1

 8

 25

 92

 75

0

0

2

 83

 75

 8

 25

 8

0

3

 8

0

0

0

 92

 100

5.5. Smartwatch Ownership and Data of Interest

Six of the twenty-four participants reported owning a smartwatch. They primarily used their devices for health (heart rate, oxygen, fall detection), activity (step count) monitoring, weather, and voice memos. The frequency of use varied from several times a day (4/6) to very seldom (1/6), with one stating that their frequency varied. Five of the six participants preferred viewing their data as numbers, with one preferring to see it presented as a visualization. The remaining eighteen participants did not own a smartwatch. Stated reasons for not owning a device included not needing it (11/18), not wanting to track personal data (7/18), assumptions of high cost (6/18), a lack of interest in technology (5/18), a lack of interest in wrist accessories (5/18), not knowing what smartwatches do (4/18), uncertainty with learning how to use it (3/18), avoiding reliance on technology (2/18), and preference for larger screens (2/18).

In our study, participants were asked to select data types (as per Islam et al. (Islam et al., 2020)) that they would be interested in tracking on a smartwatch. Of the three major categories, Health-Fitness had the highest overall interest, followed by Device-Location and Weather & Planetary data. Blood Pressure was the most favored type of data (17/24), while Humidity (4/24), Wind Info (4/24), Moon Phase (4/24), and None of the Above (1/24) were the least-selected (see Figure 10). Compared with Islam et al. (Islam et al., 2020) (in which participants indicated data they already see on their smartwatch), older adults in our study showed a greater interest in Blood Pressure, Distance Travelled, and Phone Battery.

Figure 10. Smartwatch data of interest for older adults (age 65absent65\geq 65≥ 65) in our study. Each bar represents the number of participants interested in that data type. Categories are Device-Location , Health-Fitness , and Weather & Planetary  as per a previous study by Islam et al. (Islam et al., 2020). One participant in our study chose None  as an answer.
Refer to caption

Vertical bar chart depicting the number of older adults (current study participants) who said they would be interested in viewing various data types on a smartwatch; tabular version provided in supplemental materials. Data types are put into three possible categories: Device, Health, and Weather. In the Weather category, four data types were selected by 4 or fewer participants (humidity, wind info, moon phase, and sunset/sunrise). In comparison, the other two Weather data types were chosen by 12 or more people (temperature and weather info). A fourth category of None is shown, with 1 participant selecting that.

Figure 10. Smartwatch data of interest for older adults (age 65absent65\geq 65≥ 65) in our study. Each bar represents the number of participants interested in that data type. Categories are Device-Location , Health-Fitness , and Weather & Planetary  as per a previous study by Islam et al. (Islam et al., 2020). One participant in our study chose None  as an answer.

6. Discussion

In this section, we discuss the findings of our work, highlighting differences and similarities between younger and older adults’ performance, preferences, and confidence using glanceable visualizations. We explore the implications for visualization design and use in the real world. Next, we share our experiences and lessons learned from conducting a study with older adults, offering insights for researchers interested in working with similar populations. Lastly, we offer preliminary guidance for visualization designers working on glanceable visualizations for older adults.

6.1. Aging Agility: Similar Trends With Slower Times

Across all experimental conditions, older adults’ performance consistently lagged behind that of the younger participants from the original study (Blascheck et al., 2018). The differences in average time thresholds ranged from 51 ms (Donut [Uncaptioned image] 7) to 1555 ms (Radial [Uncaptioned image] 24), and except Donut [Uncaptioned image] 24, the performance gap invariably widened in tandem with increasing data size. The confidence interval overlap analysis showed strong (5/9) or weak (2/9) evidence of differences in seven out of nine conditions, as illustrated in Figure 7. Despite these disparities, the overarching trends related to visualization type, data size, and time thresholds were comparable between younger and older adults. Both groups achieved optimal performance with Donut [Uncaptioned image], followed by Bar [Uncaptioned image] and Radial [Uncaptioned image]. Also, in almost all cases, both studies’ participants subjectively ranked Donut [Uncaptioned image] higher than Bar [Uncaptioned image] and Radial [Uncaptioned image].

The findings of our study suggest that some aspects of graphical perception remain unchanged with age (e. g., preferences) while others (e. g., speed) may decline. The similarities mentioned above between younger and older adults might indicate that basic pattern recognition, which includes identifying basic geometric shapes, lines, orientations, and spatial relationships among visual elements, remains relatively stable with age (Murman, 2015). Older adults were, however, invariably slower than their younger counterparts. This may partially stem from the compounded effects of perceptual and cognitive changes that people may experience with age. For instance, each comparison task required older adults to search for small black dots that marked the target elements. Visual search, which is the process of actively scanning the visual scene to identify a specific target among distractors, becomes slower and more error-prone with age, especially as the complexity of the visual environment increases (Becic et al., 2007). Comparing visual stimuli also relies on several cognitive functions, including working memory (WM), the mechanism responsible for the short-term simultaneous holding and manipulating information in our awareness. WM, however, declines with age, and on average, the WM of older adults is lower than that of their younger counterparts (Salthouse and Babcock, 1991). Hence, we speculate that declining WM could decrease the speed and accuracy of performing comparisons; however, it is essential to note that our study neither controlled for nor measured the impact of WM; further work is needed to understand the effects of WM among different age groups in this context. Still, the considerable differences between the performance of the young-old and the old-old in our study (Figure 9) suggest that the advancement of aging can exacerbate the performance disparities between older and younger adults. Due to our study design, the time taken to input the answer did not play a role in our measurements and can, therefore, be excluded as a possible source of differences.

We found that older adults used various strategies to perform the comparison task. We speculate that one factor that impacted participants’ strategies was the stimuli exposure time, based on remarks some participants made. For instance, P9 mentioned that “I tried to look at one [target bar]…and then the second dot [target bar], and see how tall it was. And that worked really well when things were slow. And then, when things were going faster…I tried to just kind of scan the bottom for the 2 dots and quickly look up and not focus on one versus the other, but try to see the proportion [distribution shape]” and P18 said “First I tried to see the whole thing. And when it got faster, and I couldn’t do that because it was too fast to look, and I would miss it, so I would just look at one side and just guess to see if the other one was lower or higher than that one.” Other factors, such as chart and data size, could have also influenced the strategies used to perform the task. However, further research is required to investigate these factors’ combined and isolated effects on strategy.

An important question that emerges from the findings of our study is: “Does the slower performance of older adults in interacting with glanceable visualizations on smartwatches bear practical significance in real-world situations?” Answering this question definitively necessitates further empirical examination. However, we posit that the relevance of slower performance may depend on a multifaceted interplay of factors, including the context, in which glancing occurs. Glimpsing at a smartwatch for quick insights often co-occurs with other activities, such as walking, biking, or daily chores (Pizza et al., 2016). Older adults in the study reported a much greater interest in health data than younger adults in the survey by Islam et al. (Islam et al., 2020), and health markers are often tracked on a watch during exercise. In these dynamic contexts, the extended glance duration observed in older adults might be consequential, particularly in scenarios that demand continuous attention and precise coordination. For instance, when navigating complex terrain, the interplay of visual attention and exact timing is critical; even a brief distraction could enhance the risk of falls, especially among older adults who are walking (Marigold and Patla, 2007; Yogev-Seligmann et al., 2008). The even slower performance of the old-old, particularly when compounded by more pronounced age-related physiological decline, might intensify such risks. Conversely, the additional time required to read a glanceable visualization might be negligible when an older adult is engaged in a less demanding context, such as sitting in a chair. Further studies must move beyond conjecture and deeply understand the factors influencing older adults’ performance and experience with glanceable visualizations and their tangible real-world applications. We can extend the conversation from a specific empirical finding to a broader discourse on the relationship between visualization design and utility, aging, and the human experience.

6.2. Conundrum of Chronology: The Challenge of Defining ‘Old’ in Studies

In line with the World Health Organization’s definition of older adults, as well as prevalent research practices, we set the starting chronological age for recruiting participants in our study at 65. However, our breakdown analysis of the young-old (age 65-74) and old-old participants (age 75absent75\geq 75≥ 75) showed notable performance differences between the two groups (Figure 9). This observation raises a critical question regarding visualization research involving older adults: “Who should we consider as older adults in the context of visualization research?” The heterogeneity of the older adult population is a well-known phenomenon. While aging is a shared experience, the onset and intensity of age-related physiological changes differ among individuals (Hofer et al., 2003; Ylikoski et al., 1999). Therefore, defining an age range that classifies older adults is challenging and debatable. We need mechanisms that enable us to enhance the internal and external validity of the experiments by recruiting samples that more accurately represent the older adult demographic. This might entail the creation of more systematic approaches to participant recruitment in visualization studies that assess core elements such as perception, cognition, and motor control. Pre-existing psychophysics exams like the mini-mental state test (Schatz, 2018) could serve as blueprints for crafting analogous assessments in the field of visualization. In the absence of standardized protocols, and drawing from the findings of this study, we strongly recommend that researchers interested in studying older adults embrace a comprehensive age range in their recruitment, with particular emphasis on including an ample number of participants from the older and more advanced age categories. This is crucial to enhance the probability of observing and accurately measuring the impact of age-related changes in their studies. A failure to secure a representative population could increase the risk of type-I and type-II errors, leading to erroneous insights and conclusions. For instance, if we had solely recruited participants aged 60-74, we may have falsely inferred that no noteworthy differences exist between older and younger adults—an assertion contradicted by our age 75absent75\geq 75≥ 75 results.

6.3. Methodological Flexibility: Dynamically Adapting to the Needs of Older Participants

In the course of running this study, we made two adjustments to the study design and procedure. First, we allowed participants to exit the Radial [Uncaptioned image] 24 condition if performing trials became too cumbersome for them, and second, we replaced the keyboard with two separate keys (see Section 4 for details). Typically, changing the study protocol is not advised to preserve the integrity of the experimental conditions across participants. However, our experience in this study, and also evidence from prior work (Moore and Miller, 1999; Roller and Lavrakas, 2015), suggest that methodological flexibility might be required when working with vulnerable populations. Participation in studies can cause anxiety in older adults for many reasons, such as misunderstanding the goals of the researcher, fatigue, and cognitive difficulty, which can lead to increased difficulty in finding participants (McHenry et al., 2015). This recognition brings to light a critical consideration for researchers: the ethical imperative of balancing scientific rigor with human compassion and understanding. Hodge et al. (Hodge et al., 2020) touch on a related topic, spotlighting the ethical issues that can arise with people while working in dementia research. It reminds us that study designs must not only be methodologically sound but also adaptable to the unique needs and challenges different populations face. It also opens up further research questions concerning the recruitment and retention of older adults. How can researchers establish trust and alleviate potential anxieties? What additional support mechanisms might be necessary to ensure the comfort and understanding of participants from varying age groups? These considerations emphasize the interconnected nature of research design, ethics, and the potential to contribute meaningfully to the lives of the individuals involved.

6.4. Design Implications

The primary objective of our study was to investigate and assess the graphical perception of older adults in the context of glanceable visualization. Drawing on our findings, we also offer preliminary suggestions to assist visualization designers working on glanceable visualizations for older adults.

Donut [Uncaptioned image] and Bar [Uncaptioned image] are Preferable for Quick Value Comparisons. The Donut [Uncaptioned image] and Bar [Uncaptioned image] are comparably effective for supporting the comparison task for small data sets. In our study, older adults exhibited similar efficiency with both graph types for 7 data points, averaging a threshold difference of 32 ms. However, for larger data sets, older adults’ performance with Donut [Uncaptioned image] notably outmatched their performance with Bar [Uncaptioned image], with average thresholds of 401 ms versus 771 ms for 24 data points. Blascheck et al. also found performance degradation in younger adults with larger datasets. However, the average threshold difference between Donut [Uncaptioned image] and Bar [Uncaptioned image] for older adults was approximately twice that of younger adults for each data size. While younger adults’ average thresholds were less than 500 ms for both Bar [Uncaptioned image] and Donut [Uncaptioned image] with 24 data points, old-old participants’ average threshold for Bar [Uncaptioned image] with 24 data points was 1005 ms. With that in mind, we lean toward recommending Donut [Uncaptioned image] for older adults if there are several data points or if comparison speed is critical. Notably, older adults were consistently slower with Radial [Uncaptioned image] across all data sizes, with times ranging from 943 ms (7 data points) to 5460 ms (24 data points). This suggests that Radial [Uncaptioned image] are less effective than Donut and Bar [Uncaptioned image] for quick data comparisons on smartwatches.

Radial [Uncaptioned image] is Preferable for Displaying Task Progress and Completion. Our combined analysis of performance time and participant feedback indicates that Radial [Uncaptioned image] are possibly more suitable for visualizing progress and task completion (e. g., achieving a step goal) for a single data point rather than numerical value comparison. While younger adults had average thresholds below 1000 ms with Radial [Uncaptioned image] for 7 and 12 data points, old-old participants had average thresholds over 1000 ms for all three data sizes. This conclusion aligns with Blascheck et al.’s (Blascheck et al., 2023) work suggesting that Radial [Uncaptioned image] are better for representing single data points in part-to-whole comparisons, particularly for goal completion.

Balance the Tension between Visualization Size and Data Size. Our participants frequently noted difficulties in distinguishing graph elements (e. g., bars in Bar [Uncaptioned image]) as they became narrower with increasing data size. Because its region widths are dependent on data values instead of the number of data points, the regions of a Donut chart shrink at a slower, different rate as data size increases compared to the other two chart types. We speculate that older adults’ greater reported preference for this chart across data sizes compared to younger adults may partly be due to this phenomenon. While the tension between display size and information density is a known challenge in visualization (Wu et al., 2012), the small dimensions of smartwatch screens can possibly exacerbate and amplify the issue. Hence, visualization designers should ensure adequate perceptibility of information for older adults by either using charts less affected by data size (e.g., Donut charts) or considering recommendations from existing work aimed at displaying large amounts of data on small screens (Chen, 2017; Neshati et al., 2019b).

Our current knowledge of empirically derived glanceable visualization design guidelines for older adults is lacking. To address this issue, further work is needed to examine a broader range of glanceable visualizations and analysis tasks.

7. Limitations and Future Work

In replicating Blascheck et al.’s (Blascheck et al., 2018) study, we inherit limitations the authors originally noted. They discuss that some of the aspects of the study design, such as the color variance and the placement of the dot markers, could have influenced participants’ performance and contributed to the differences observed between Donut [Uncaptioned image], Bar [Uncaptioned image], and Radial [Uncaptioned image]. It is possible that these factors similarly impacted the performance of the older participants in our study. They also mention the simplicity of the studied comparison task, requiring the participants to choose which of two marked elements displayed a larger value. In contrast, more complex and perceptually demanding tasks could increase observed thresholds. Similarly, the performance thresholds of our older participants are likely to change with more complex tasks. Moreover, reading the watch on a stand at an ideal height while sitting is not representative of real-world use, as worsening hand-eye coordination due to aging could introduce difficulty in reading a worn smartwatch (Guan and Wade, 2000). Further work can ascertain the impacts of worsening hand-eye coordination on reading worn glanceable visualizations.

Our study’s participant pool was notably skewed toward higher educational levels, predominantly comprised of individuals with master’s degrees. This almost certainly arises from our recruitment strategy, which focused on local participants in an area rich in educational institutions; 10 of our 24 participants stated a primary or most recent educational occupation. This skew, however, aligns with the original study, which also had a master’s degree as the most common educational level (see 1(b) for more details). While the higher academic levels could correlate with higher visualization literacy, this demographic may more accurately reflect the likely user base of smartwatches, as age and educational level are influential factors in technology adoption (Rupp et al., 2018).

The context, in which a conceptual replication study occurs (e. g., different time, place, or with other participants) can offer both benefits and challenges. On the positive side, these variations can enhance the generalizability of the findings and provide robust tests of underlying theories across different conditions, thereby enriching scientific understanding. However, such changes may also introduce potential confounding variables. In Section 6.1, we reflect on the observed difference between younger and older adults’ performance from the perspective of age-related perceptual and cognitive disparities between the two groups. However, other differences between younger and older adults could have also impacted the results. For instance, life experience may notably vary between these age groups, influencing their reactions to experimental stimuli or tasks. Social and cultural norms also differ between generations, potentially affecting behavior and attitudes in ways that complicate the study’s interpretation.

The primary focus of this work was to establish and compare older adults’ time thresholds. There is still an unmet need for more extensive investigation of various aspects of glanceable visualization design for this population. For instance, more work is needed to understand the design considerations for presenting multiple visualizations on a watch face (recently done with younger adults by Blascheck et al. (Blascheck et al., 2023)) as well as how to best present text and fine-grained visual components that may be difficult to see with low visual acuity on such a small screen (Mitzner et al., 2015). Methods of annotating data and bringing attention to noteworthy data points likely require different considerations on a smaller screen and for shorter viewing times (glances), with previous work recommending these types of visual aids for older adults (Le et al., 2015). Some participants in our study mentioned a preference for viewing data as numbers instead of visualization on their watch. This may indicate that more precise low-level tasks such as retrieve value and determine range may better represent numbers than a complete visualization. Facilitating rapid interactions for glanceable visualizations is a noted challenge within this area (Blascheck et al., 2021). Investigating which low-level tasks are even suitable for glanceable visualization is also an open problem that could help narrow the research community’s focus.

8. Conclusion

In this paper, we replicated the study by Blascheck et al. (Blascheck et al., 2018) that investigated how quickly people can compare data using a glanceable visualization on smartwatches. Our primary objective was to rerun this study with older adults (age 65absent65\geq 65≥ 65) to establish time thresholds and compare the results of the two studies (original and current) to learn about possible differences between the younger and older adults’ performance, strategies employed to perform tasks, and if any differences existed concerning chart type. For each combination of chart type (Bar [Uncaptioned image], Donut [Uncaptioned image], Radial [Uncaptioned image]) and data size (7, 12, 24), participants (n=24𝑛24n=24italic_n = 24) completed 9 staircases while performing a data comparison task using a two-alternative forced choice approach. Our results showed weak or strong evidence of differences between younger adults (original study) and old-old adults (age 75absent75\geq 75≥ 75) across almost all conditions. However, all but one condition (Radial [Uncaptioned image] 24) had insufficient evidence of differences for young adults and young-old adults (age 65-74).

These results prompt interesting questions regarding how we study glanceable visualization for older adults, especially when defining who an “older adult” is, as well as whether differences in performance speed have noticeable real-world consequences. We also discuss our experience working with older adults in an in-person study, including takeaways regarding flexible study design and a brief reflection on working with vulnerable populations. Future work can expand on these questions and contribute further understanding of how we can better design glanceable visualizations for older adults. We hope this study sparks additional interest in the visualization community for considering equitable design in areas that, while more challenging, can level the playing field of data-driven insights.

Acknowledgements.
Tanja Blascheck is funded by the European Social Fund and the Ministry of Science, Research and Arts Baden-Württemberg.

References

  • (1)
  • Abdel-Ghany and Sharpe (1997) Mohamed Abdel-Ghany and Deanna L Sharpe. 1997. Consumption patterns among the young-old and old-old. Journal of Consumer Affairs 31, 1 (1997), 90–112.
  • Amini et al. (2017) Fereshteh Amini, Khalad Hasan, Andrea Bunt, and Pourang Irani. 2017. Data representations for in-situ exploration of health and fitness data. In Proceedings of the 11th EAI international conference on pervasive computing technologies for healthcare. Association for Computing Machinery, New York, NY, USA, 163–172.
  • Angerbauer and Sedlmair (2022) Katrin Angerbauer and Michael Sedlmair. 2022. Toward Inclusion and Accessibility in Visualization Research: Speculations on Challenges, Solution Strategies, and Calls for Action (Position Paper). In IEEE Evaluation and Beyond-Methodological Approaches for Visualization (BELIV). –, –, 20–27.
  • Backonja et al. (2016) Uba Backonja, Nai-Ching Chi, Yong Choi, Amanda K Hall, Thai Le, Youjeong Kang, and George Demiris. 2016. Visualization approaches to support healthy aging: a systematic review. Journal of innovation in health informatics 23, 3 (2016), 860.
  • Baltes and Smith (2003) Paul B Baltes and Jacqui Smith. 2003. New frontiers in the future of aging: From successful aging of the young old to the dilemmas of the fourth age. Gerontology 49, 2 (2003), 123–135.
  • Becic et al. (2007) Ensar Becic, Arthur F Kramer, and Walter R Boot. 2007. Age-related differences in visual search in dynamic displays. Psychology and Aging 22, 1 (2007), 67.
  • Besançon and Dragicevic (2017) Lonni Besançon and Pierre Dragicevic. 2017. La Différence Significative entre Valeurs p et Intervalles de Confiance. In 29ème conférence francophone sur l’Interaction Homme-Machine, AFIHM (Ed.). AFIHM, Poitiers, France, 10. https://inria.hal.science/hal-01562281 Alt.IHM.
  • Blascheck et al. (2021) Tanja Blascheck, Frank Bentley, Eun Kyoung Choe, Tom Horak, and Petra Isenberg. 2021. Characterizing Glanceable Visualizations: From Perception to Behavior Change. In Mobile Data Visualization. Chapman and Hall/CRC, New York, NY, USA, 151–176.
  • Blascheck et al. (2018) Tanja Blascheck, Lonni Besançon, Anastasia Bezerianos, Bongshin Lee, and Petra Isenberg. 2018. Glanceable visualization: Studies of data comparison performance on smartwatches. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 630–640.
  • Blascheck et al. (2023) Tanja Blascheck, Lonni Besançon, Anastasia Bezerianos, Bongshin Lee, Alaul Islam, Tingying He, and Petra Isenberg. 2023. Studies of Part-to-Whole Glanceable Visualizations on Smartwatch Faces. In IEEE 16th Pacific Visualization Symposium. IEEE Computer Society Press, Washington, DC, USA, 187–196.
  • Blascheck and Isenberg (2021) Tanja Blascheck and Petra Isenberg. 2021. A replication study on glanceable visualizations: Comparing different stimulus sizes on a laptop computer. In IVAPP 2021-12th International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. SCITEPRESS-Science and Technology Publications, Setúbal, Portugal, 133–143.
  • Bradley (1958) James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525–528.
  • Brandt et al. (2014) Mark J Brandt, Hans IJzerman, Ap Dijksterhuis, Frank J Farach, Jason Geller, Roger Giner-Sorolla, James A Grange, Marco Perugini, Jeffrey R Spies, and Anna Van’t Veer. 2014. The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology 50 (2014), 217–224.
  • Cajamarca et al. (2023) Gabriela Cajamarca, Valeria Herskovic, Stephannie Dondighual, Carolina Fuentes, and Nervo Verdezoto. 2023. Understanding How to Design Health Data Visualizations for Chilean Older Adults on Mobile Devices. In Proceedings of the ACM Designing Interactive Systems Conference. Association for Computing Machinery, New York, NY, USA, 1309–1324. https://doi.org/10.1145/3563657.3596109
  • Cajamarca et al. (2020) Gabriela Cajamarca, Valeria Herskovic, and Pedro O. Rossel. 2020. Enabling Older Adults’ Health Self-Management through Self-Report and Visualization—A Systematic Literature Review. Sensors 20, 15 (2020), 1–16. https://doi.org/10.3390/s20154348
  • Chen (2017) Yang Chen. 2017. Visualizing large time-series data on very small screens. In Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Short Papers. The Eurographics Association, Eindhoven, Netherlands, 37–41.
  • Chung et al. (2023) Jane Chung, Heidi Rishel Brakey, Blaine Reeder, Orrin Myers, and George Demiris. 2023. Community-dwelling older adults’ acceptance of smartwatches for health and location tracking. International journal of older people nursing 18, 1 (2023), e12490.
  • Consolvo et al. (2008) Sunny Consolvo, Predrag Klasnja, David W McDonald, Daniel Avrahami, Jon Froehlich, Louis LeGrand, Ryan Libby, Keith Mosher, and James A Landay. 2008. Flowers or a robot army? Encouraging awareness & activity with personal, mobile displays. In Proceedings of the 10th international conference on Ubiquitous computing. Association for Computing Machinery, New York, NY, USA, 54–63.
  • Crandall and Sherman (2016) Christian S Crandall and Jeffrey W Sherman. 2016. On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology 66 (2016), 93–99.
  • Cristescu et al. (2022) Irina Cristescu, Dragoş-Daniel Iordache, and Cristian Ţîrlea. 2022. Behavioral intention to use smartwatches: a case study. In International Conference on Electronics, Computers and Artificial Intelligence. IEEE Computer Society Press, Washington, DC, USA, 1–4.
  • Cumming (2014) Geoff Cumming. 2014. The new statistics: Why and how. Psychological science 25, 1 (2014), 7–29.
  • Efron (1987) Bradley Efron. 1987. Better bootstrap confidence intervals. Journal of the American statistical Association 82, 397 (1987), 171–185.
  • Fan et al. (2023) Mingming Fan, Yiwen Wang, Yuni Xie, Franklin Mingzhe Li, and Chunyang Chen. 2023. Understanding How Older Adults Comprehend COVID-19 Interactive Visualizations via Think-Aloud Protocol. International Journal of Human–Computer Interaction 39, 8 (2023), 1626–1642.
  • Galesic et al. (2009) Mirta Galesic, Rocio Garcia-Retamero, and Gerd Gigerenzer. 2009. Using icon arrays to communicate medical risks: Overcoming low numeracy. Health Psychol. 28, 2 (March 2009), 210–216.
  • García-Pérez (1998) Miguel A García-Pérez. 1998. Forced-choice staircases with fixed step sizes: asymptotic and small-sample properties. Vision research 38, 12 (1998), 1861–1881.
  • Gouveia et al. (2016) Rúben Gouveia, Fábio Pereira, Evangelos Karapanos, Sean A. Munson, and Marc Hassenzahl. 2016. Exploring the Design Space of Glanceable Feedback for Physical Activity Trackers. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp ’16). Association for Computing Machinery, New York, NY, USA, 144–155. https://doi.org/10.1145/2971648.2971754
  • Guan and Wade (2000) **hua Guan and Michael G Wade. 2000. The effect of aging on adaptive eye-hand coordination. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 55, 3 (2000), P151–P162.
  • Hodge et al. (2020) James Hodge, Sarah Foley, Rens Brankaert, Gail Kenning, Amanda Lazar, Jennifer Boger, and Kellie Morrissey. 2020. Relational, flexible, everyday: learning from ethics in dementia research. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–16.
  • Hofer et al. (2003) Scott M Hofer, Stig Berg, and Pertti Era. 2003. Evaluating the interdependence of aging-related changes in visual and auditory acuity, balance, and cognitive functioning. Psychology and aging 18, 2 (2003), 285.
  • Horak et al. (2018) Tom Horak, Sriram Karthik Badam, Niklas Elmqvist, and Raimund Dachselt. 2018. When david meets goliath: Combining smartwatches with a large vertical display for visual data exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–13.
  • Isenberg (2021) Petra Isenberg. 2021. Micro Visualizations: Design and Analysis of Visualizations for Small Display Spaces. Habilitation thesis. Université Paris-Saclay, Gif-sur-Yvette, France. https://hal.inria.fr/tel-03584024
  • Islam et al. (2022) Alaul Islam, Ran**i Aravind, Tanja Blascheck, Anastasia Bezerianos, and Petra Isenberg. 2022. Preferences and effectiveness of sleep data visualizations for smartwatches and fitness bands. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–17.
  • Islam et al. (2020) Alaul Islam, Anastasia Bezerianos, Bongshin Lee, Tanja Blascheck, and Petra Isenberg. 2020. Visualizing Information on Watch Faces: A Survey with Smartwatch Users. In IEEE Visualization Conference (VIS). IEEE Computer Society Press, Washington, DC, USA, 156–160.
  • Johnson and Finn (2017) Jeff Johnson and Kate Finn. 2017. Designing user interfaces for an aging population: Towards universal design. Morgan Kaufmann, Cambridge, MA, USA.
  • Khakurel et al. (2018) Jayden Khakurel, Antti Knutas, Helinä Melkas, Birgit Penzenstadler, Bo Fu, and Jari Porras. 2018. Categorization Framework for Usability Issues of Smartwatches and Pedometers for the Older Adults. In Universal Access in Human-Computer Interaction. Methods, Technologies, and Users, Margherita Antona and Constantine Stephanidis (Eds.). Springer International Publishing, Cham, Germany, 91–106.
  • Le et al. (2014) Thai Le, Cecilia Aragon, Hilaire Thompson, and George Demiris. 2014. Elementary Graphical Perception for Older Adults: A Comparison with the General Population. Perception 43 (11 2014), 1249–60. https://doi.org/10.1068/p7801
  • Le et al. (2015) Thai Le, Blaine Reeder, Daisy Yoo, Rafae Aziz, Hilaire J Thompson, and George Demiris. 2015. An evaluation of wellness assessment visualizations for older adults. Telemedicine and e-Health 21, 1 (2015), 9–15.
  • Le et al. (2016) Thai Le, Hilaire J. Thompson, and George Demiris. 2016. A Comparison of Health Visualization Evaluation Techniques with Older Adults. IEEE Computer Graphics and Applications 36, 4 (2016), 67–77. https://doi.org/10.1109/MCG.2015.93
  • Le et al. (2012) Thai Le, Katarzyna Wilamowska, George Demiris, and Hilaire Thompson. 2012. Integrated data visualisation: an approach to capture older adults’ wellness. Int. J. Electron. Healthc. 7, 2 (2012), 89–104.
  • Lee et al. (2020) Bongshin Lee, Eun Kyoung Choe, Petra Isenberg, Kim Marriott, and John Stasko. 2020. Reaching broader audiences with data visualization. IEEE Computer Graphics and Applications 40, 2 (2020), 82–90.
  • Li et al. (2020) Lin Li, Wei Peng, Anastasia Kononova, Marie Bowen, and Shelia R Cotten. 2020. Factors associated with older adults’ long-term use of wearable activity trackers. Telemedicine and e-Health 26, 6 (2020), 769–775.
  • Likert (1932) Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology 22, 140 (1932), 55.
  • Macmillan and Creelman (2004) Neil A Macmillan and C Douglas Creelman. 2004. Detection theory: A user’s guide. Psychology press, Mahwah, NJ, USA.
  • Marigold and Patla (2007) Daniel S Marigold and Aftab E Patla. 2007. Gaze fixation patterns for negotiating complex ground terrain. Neuroscience 144, 1 (2007), 302–313.
  • Marriott et al. (2021) Kim Marriott, Bongshin Lee, Matthew Butler, Ed Cutrell, Kirsten Ellis, Cagatay Goncu, Marti Hearst, Kathleen McCoy, and Danielle Albers Szafir. 2021. Inclusive data visualization for people with disabilities: a call to action. Interactions 28, 3 (2021), 47–51.
  • McHenry et al. (2015) Judith C McHenry, Kathleen C Insel, Gilles O Einstein, Amy N Vidrine, Kari M Koerner, and Daniel G Morrow. 2015. Recruitment of older adults: success may be in the details. The Gerontologist 55, 5 (2015), 845–853.
  • Mitzner et al. (2015) Tracy L Mitzner, Cory-Ann Smarr, Wendy A Rogers, and Arthur D Fisk. 2015. Considering older adults’ perceptual capabilities in the design process. The Cambridge handbook of applied perception research 2 (2015), 1051––1079.
  • Moore and Miller (1999) Linda Weaver Moore and Margaret Miller. 1999. Initiating research with doubly vulnerable populations. Journal of Advanced Nursing 30, 5 (1999), 1034–1040.
  • Murman (2015) Daniel L Murman. 2015. The impact of age on cognition. Seminars in hearing 36, 03 (2015), 111–121.
  • Neshati et al. (2021) Ali Neshati, Fouad Alallah, Bradley Rey, Yumiko Sakamoto, Marcos Serrano, and Pourang Irani. 2021. SF-LG: Space-Filling Line Graphs for Visualizing Interrelated Time-Series Data on Smartwatches. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction (Toulouse & Virtual, France) (MobileHCI ’21). Association for Computing Machinery, New York, NY, USA, Article 5, 13 pages.
  • Neshati et al. (2019a) Ali Neshati, Yumiko Sakamoto, and Pourang Irani. 2019a. Challenges in Displaying Health Data on Small Smartwatch Screens. In ITCH. IOS Press, Amsterdam, Netherlands, 325–332.
  • Neshati et al. (2019b) Ali Neshati, Yumiko Sakamoto, Launa C Leboe-McGowan, Jason Leboe-McGowan, Marcos Serrano, and Pourang Irani. 2019b. G-Sparks: Glanceable Sparklines on Smartwatches. In Proceedings of Graphics Interface 2019 (Kingston, Ontario) (GI 2019). Canadian Information Processing Society, Kingston, Ontario, Canada, 9 pages. https://doi.org/10.20380/GI2019.23
  • Pham et al. (2012) Tuan Pham, Shannon Mejía, Ronald Metoyer, and Karen Hooker. 2012. The Effects of Visualization Feedback on Promoting Health Goal Progress in Older Adults. In EuroVis - Short Papers, Miriah Meyer and Tino Weinkaufs (Eds.). The Eurographics Association, Eindhoven, Netherlands, 91–95.
  • Pizza et al. (2016) Stefania Pizza, Barry Brown, Donald McMillan, and Airi Lampinen. 2016. Smartwatch in vivo. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 5456–5469.
  • Price et al. (2016) Margaux M Price, Jessica J Crumley-Branyon, William R Leidheiser, and Richard Pak. 2016. Effects of information visualization on older adults’ decision-making performance in a Medicare plan selection task: A comparative usability study. JMIR Hum. Factors 3, 1 (June 2016), e16.
  • Roller and Lavrakas (2015) Margaret R Roller and Paul J Lavrakas. 2015. Applied qualitative research design: A total quality framework approach. Guilford Publications, new York, NY, USA.
  • Rosales et al. (2017) Andrea Rosales, Mireia Fernández-Ardèvol, Francesca Comunello, Simone Mulargia, and Nuria Ferran-Ferrer. 2017. Older people and smartwatches, initial experiences. El Profesional de la Información 26 (06 2017), 457. https://doi.org/10.3145/epi.2017.may.12
  • Rupp et al. (2018) Michael A Rupp, Jessica R Michaelis, Daniel S McConnell, and Janan A Smither. 2018. The role of individual differences on perceptions of wearable fitness device trust, usability, and motivational impact. Applied ergonomics 70 (2018), 77–87.
  • Salthouse and Babcock (1991) Timothy Salthouse and Renee Babcock. 1991. Decomposing adult age differences in working memory. Developmental psychology 27, 5 (1991), 763.
  • Schatz (2018) Philip Schatz. 2018. Mini-Mental State Exam. Springer International Publishing, Cham, 2226–2228. https://doi.org/10.1007/978-3-319-57111-9_199
  • Schiewe et al. (2020) Alexander Schiewe, Andrey Krekhov, Frederic Kerber, Florian Daiber, and Jens Krüger. 2020. A Study on Real-Time Visualizations During Sports Activities on Smartwatches. In Proceedings of the 19th International Conference on Mobile and Ubiquitous Multimedia. Association for Computing Machinery, New York, NY, USA, 18–31. https://doi.org/10.1145/3428361.3428409
  • Sinoff and Ore (1997) Gary Sinoff and Liora Ore. 1997. The Barthel Activities of Daily Living Index: self-reporting versus actual performance in the old-old (\geq 75 years). Journal of the American Geriatrics Society 45, 7 (1997), 832–836.
  • Sony ( ) Sony. –. SmartWatch 3 — Developer World — developer.sony.com. https://developer.sony.com/smartwatch-3. [Accessed 07-15-2023].
  • Stroebe and Strack (2014) Wolfgang Stroebe and Fritz Strack. 2014. The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science 9, 1 (2014), 59–71.
  • Unsworth and Engle (2007) Nash Unsworth and Randall W Engle. 2007. The nature of individual differences in working memory capacity: active maintenance in primary memory and controlled search from secondary memory. Psychological review 114, 1 (2007), 104.
  • Vargemidis et al. (2023) Dimitri Vargemidis, Kathrin Gerling, Vero Vanden Abeele, and Luc Geurts. 2023. Performance and Pleasure: Exploring the Perceived Usefulness and Appeal of Physical Activity Data Visualizations with Older Adults. ACM Transactions on Accessible Computing 16, 3 (2023), 1–35.
  • Vespa et al. (2018) Jonathan Vespa, Lauren Medina, and David Armstrong. 2018. Demographic Turning Points for the United Sates: Population Projections for 2020 to 2060. https://www.census.gov/content/dam/Census/library/publications/2020/demo/p25-1144.pdf. (Accessed on 09/05/2023).
  • Wu et al. (2012) Yingcai Wu, Xiaotong Liu, Shixia Liu, and Kwan-Liu Ma. 2012. ViSizer: a visualization resizing framework. IEEE Transactions on Visualization and Computer Graphics 19, 2 (2012), 278–290.
  • Ylikoski et al. (1999) Raija Ylikoski, Ari Ylikoski, Pertti Keskivaara, Reijo Tilvis, Raimo Sulkava, and Timo Erkinjuntti. 1999. Heterogeneity of congnitive profiles in aging: successful aging, normal aging, and individuals at risks for cognitive decline. European journal of neurology 6, 6 (1999), 645–652.
  • Yogev-Seligmann et al. (2008) Galit Yogev-Seligmann, Jeffrey M Hausdorff, and Nir Giladi. 2008. The role of executive function and attention in gait. Movement disorders: official journal of the Movement Disorder Society 23, 3 (2008), 329–342.