-
A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems
Authors:
Shilpika,
Bethany Lusch,
Murali Emani,
Filippo Simini,
Venkatram Vishwanath,
Michael E. Papka,
Kwan-Liu Ma
Abstract:
The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analyti…
▽ More
The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analytical system that helps make sense of such massive data, mainly the hardware logs, job logs, and environment logs collected from disparate subsystems and components of a supercomputer system. This end-to-end log analysis system, coupled with visual analytics support, allows users to glean and promptly extract supercomputer usage and error patterns at varying temporal and spatial resolutions. We use multiresolution dynamic mode decomposition (mrDMD), a technique that depicts high-dimensional data as correlated spatial-temporal variations patterns or modes, to extract variation patterns isolated at specified frequencies. Our improvements to the mrDMD algorithm help promptly reveal useful information in the massive environment log dataset, which is then associated with the processed hardware and job log datasets using our visual analytics system. Furthermore, our system can identify the usage and error patterns filtered at user, project, and subcomponent levels. We exemplify the effectiveness of our approach with two use scenarios with the Cray XC40 supercomputer.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics
Authors:
Nicholas Synovic,
Matt Hyatt,
Rohan Sethi,
Sohini Thota,
Shilpika,
Allan J. Miller,
Wenxin Jiang,
Emmanuel S. Amobi,
Austin Pinderski,
Konstantin Läufer,
Nicholas J. Hayward,
Neil Klingensmith,
James C. Davis,
George K. Thiruvathukal
Abstract:
Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about pr…
▽ More
Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor. We illustrate the value of longitudinal data and conclude with a research agenda. The tool's demo video can be watched at https://youtu.be/YigEHy3_JCo. The source code can be found at https://github.com/SoftwareSystemsLaboratory/prime.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data Analysis
Authors:
Fnu Shilpika,
Takanori Fujiwara,
Naohisa Sakamoto,
Jorji Nonaka,
Kwan-Liu Ma
Abstract:
Many real-world applications involve analyzing time-dependent phenomena, which are intrinsically functional, consisting of curves varying over a continuum (e.g., time). When analyzing continuous data, functional data analysis (FDA) provides substantial benefits, such as the ability to study the derivatives and to restrict the ordering of data. However, continuous data inherently has infinite dimen…
▽ More
Many real-world applications involve analyzing time-dependent phenomena, which are intrinsically functional, consisting of curves varying over a continuum (e.g., time). When analyzing continuous data, functional data analysis (FDA) provides substantial benefits, such as the ability to study the derivatives and to restrict the ordering of data. However, continuous data inherently has infinite dimensions, and for a long time series, FDA methods often suffer from high computational costs. The analysis problem becomes even more challenging when updating the FDA results for continuously arriving data. In this paper, we present a visual analytics approach for monitoring and reviewing time series data streamed from a hardware system with a focus on identifying outliers by using FDA. To perform FDA while addressing the computational problem, we introduce new incremental and progressive algorithms that promptly generate the magnitude-shape (MS) plot, which conveys both the functional magnitude and shape outlyingness of time series data. In addition, by using an MS plot in conjunction with an FDA version of principal component analysis, we enhance the analyst's ability to investigate the visually-identified outliers. We illustrate the effectiveness of our approach with two use scenarios using real-world datasets. The resulting tool is evaluated by industry experts using real-world streaming datasets.
△ Less
Submitted 21 February, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Staged Animation Strategies for Online Dynamic Networks
Authors:
Tarik Crnovrsanin,
Shilpika,
Senthil Chandrasegaran,
Kwan-Liu Ma
Abstract:
Dynamic networks -- networks that change over time -- can be categorized into two types: offline dynamic networks, where all states of the network are known, and online dynamic networks, where only the past states of the network are known. Research on staging animated transitions in dynamic networks has focused more on offline data, where rendering strategies can take into account past and future…
▽ More
Dynamic networks -- networks that change over time -- can be categorized into two types: offline dynamic networks, where all states of the network are known, and online dynamic networks, where only the past states of the network are known. Research on staging animated transitions in dynamic networks has focused more on offline data, where rendering strategies can take into account past and future states of the network. Rendering online dynamic networks is a more challenging problem since it requires a balance between timeliness for monitoring tasks -- so that the animations do not lag too far behind the events -- and clarity for comprehension tasks -- to minimize simultaneous changes that may be difficult to follow. To illustrate the challenges placed by these requirements, we explore three strategies to stage animations for online dynamic networks: time-based, event-based, and a new hybrid approach that we introduce by combining the advantages of the first two. We illustrate the advantages and disadvantages of each strategy in representing low- and high-throughput data and conduct a user study involving monitoring and comprehension of dynamic networks. We also conduct a follow-up, a think-aloud study combining monitoring and comprehension with experts in dynamic network visualization. Our findings show that animation staging strategies that emphasize comprehension do better for participant response times and accuracy. However, the notion of ``comprehension'' is not always clear when it comes to complex changes in highly dynamic networks, requiring some iteration in staging that the hybrid approach affords. Based on our results, we make recommendations for balancing event-based and time-based parameters for our hybrid approach.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction
Authors:
Takanori Fujiwara,
Shilpika,
Naohisa Sakamoto,
Jorji Nonaka,
Keiji Yamamoto,
Kwan-Liu Ma
Abstract:
Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually exa…
▽ More
Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually examine and correlate the DR results out of different data subsets. When the number of dimensions is large either in terms of the number of time points or attributes, this manual task becomes too tedious and infeasible. In this paper, we present MulTiDR, a new DR framework that enables processing of time-dependent multivariate data as a whole to provide a comprehensive overview of the data. With the framework, we employ DR in two steps. When treating the instances, time points, and attributes of the data as a 3D array, the first DR step reduces the three axes of the array to two, and the second DR step visualizes the data in a lower-dimensional space. In addition, by coupling with a contrastive learning method and interactive visualizations, our framework enhances analysts' ability to interpret DR results. We demonstrate the effectiveness of our framework with four case studies using real-world datasets.
△ Less
Submitted 27 October, 2021; v1 submitted 2 August, 2020;
originally announced August 2020.
-
An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data
Authors:
Takanori Fujiwara,
Jia-Kai Chou,
Shilpika,
Panpan Xu,
Liu Ren,
Kwan-Liu Ma
Abstract:
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data…
▽ More
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.
△ Less
Submitted 15 October, 2019; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Metrics Dashboard: A Hosted Platform for Software Quality Metrics
Authors:
George K. Thiruvathukal,
Shilpika,
Nicholas J. Hayward,
Konstantin Läufer
Abstract:
There is an emerging consensus in the scientific software community that progress in scientific research is dependent on the "quality and accessibility of software at all levels" (wssspe.researchcomputing.org.uk/). This progress depends on embracing the best traditional---and emergent---practices in software engineering, especially agile practices that intersect with the more formal tradition of s…
▽ More
There is an emerging consensus in the scientific software community that progress in scientific research is dependent on the "quality and accessibility of software at all levels" (wssspe.researchcomputing.org.uk/). This progress depends on embracing the best traditional---and emergent---practices in software engineering, especially agile practices that intersect with the more formal tradition of software engineering. As a first step in our larger exploratory project to study in-process quality metrics for software development projects in Computational Science and Engineering (CSE), we have developed the Metrics Dashboard, a platform for producing and observing metrics by mining open-source software repositories on GitHub. Unlike GitHub and similar systems that provide individual performance metrics (e.g. commits), the Metrics Dashboard focuses on metrics indicative of team progress and project health. The Metrics Dashboard allows the user to submit the URL of a hosted repository for batch analysis, whose results are then cached. Upon completion, the user can interactively study various metrics over time (at varying granularity), numerically and visually. The initial version of the system is up and running as a public cloud service (SaaS) and supports project size (KLOC), defect density, defect spoilage, and productivity. While our system is by no means the first to support software metrics, we believe it may be one of the first community-focused extensible resources that can be used by any hosted project.
△ Less
Submitted 8 April, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.