Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
Authors:
Deepthi Raghunandan,
Aayushi Roy,
Shenzhi Shi,
Niklas Elmqvist,
Leilani Battle
Abstract:
Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the…
▽ More
Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.
△ Less
Submitted 8 September, 2022; v1 submitted 6 September, 2022;
originally announced September 2022.
Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations
Authors:
Deepthi Raghunandan,
Zhe Cui,
Kartik Krishnan,
Segen Tirfe,
Shenzhi Shi,
Tejaswi Darshan Shrestha,
Leilani Battle,
Niklas Elmqvist
Abstract:
Kee** abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We…
▽ More
Kee** abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding our next set of improvements to the tool. Our results suggest that users find Lodestar useful for rapidly creating data science workflows.
△ Less
Submitted 16 April, 2022;
originally announced April 2022.