-
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches
Authors:
Chen Zhu-Tian,
Zeyu Xiong,
Xiaoshuo Yao,
Elena Glassman
Abstract:
Crafting effective prompts for code generation or editing with Large Language Models (LLMs) is not an easy task. Particularly, the absence of immediate, stable feedback during prompt crafting hinders effective interaction, as users are left to mentally imagine possible outcomes until the code is generated. In response, we introduce Language-Oriented Code Sketching, an interactive approach that pro…
▽ More
Crafting effective prompts for code generation or editing with Large Language Models (LLMs) is not an easy task. Particularly, the absence of immediate, stable feedback during prompt crafting hinders effective interaction, as users are left to mentally imagine possible outcomes until the code is generated. In response, we introduce Language-Oriented Code Sketching, an interactive approach that provides instant, incremental feedback in the form of code sketches (i.e., incomplete code outlines) during prompt crafting. This approach converts a prompt into a code sketch by leveraging the inherent linguistic structures within the prompt and applying classic natural language processing techniques. The sketch then serves as an intermediate placeholder that not only previews the intended code structure but also guides the LLM towards the desired code, thereby enhancing human-LLM interaction. We conclude by discussing the approach's applicability and future plans.
△ Less
Submitted 10 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Unraveling the Temporal Dynamics of the Unet in Diffusion Models
Authors:
Vidya Prasad,
Chen Zhu-Tian,
Anna Vilanova,
Hanspeter Pfister,
Nicola Pezzotti,
Hendrik Strobelt
Abstract:
Diffusion models have garnered significant attention since they can effectively learn complex multivariate Gaussian distributions, resulting in diverse, high-quality outcomes. They introduce Gaussian noise into training data and reconstruct the original data iteratively. Central to this iterative process is a single Unet, adapting across time steps to facilitate generation. Recent work revealed th…
▽ More
Diffusion models have garnered significant attention since they can effectively learn complex multivariate Gaussian distributions, resulting in diverse, high-quality outcomes. They introduce Gaussian noise into training data and reconstruct the original data iteratively. Central to this iterative process is a single Unet, adapting across time steps to facilitate generation. Recent work revealed the presence of composition and denoising phases in this generation process, raising questions about the Unets' varying roles. Our study dives into the dynamic behavior of Unets within denoising diffusion probabilistic models (DDPM), focusing on (de)convolutional blocks and skip connections across time steps. We propose an analytical method to systematically assess the impact of time steps and core Unet components on the final output. This method eliminates components to study causal relations and investigate their influence on output changes. The main purpose is to understand the temporal dynamics and identify potential shortcuts during inference. Our findings provide valuable insights into the various generation phases during inference and shed light on the Unets' usage patterns across these phases. Leveraging these insights, we identify redundancies in GLIDE (an improved DDPM) and improve inference time by ~27% with minimal degradation in output quality. Our ultimate goal is to guide more informed optimization strategies for inference and influence new model designs.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
CrossData: Leveraging Text-Data Connections for Authoring Data Documents
Authors:
Chen Zhu-Tian,
Haijun Xia
Abstract:
Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that thei…
▽ More
Data documents play a central role in recording, presenting, and disseminating data. Despite the proliferation of applications and systems designed to support the analysis, visualization, and communication of data, writing data documents remains a laborious process, requiring a constant back-and-forth between data processing and writing tools. Interviews with eight professionals revealed that their workflows contained numerous tedious, repetitive, and error-prone operations. The key issue that we identified is the lack of persistent connection between text and data. Thus, we developed CrossData, a prototype that treats text-data connections as persistent, interactive, first-class objects. By automatically identifying, establishing, and leveraging text-data connections, CrossData enables rich interactions to assist in the authoring of data documents. An expert evaluation with eight users demonstrated the usefulness of CrossData, showing that it not only reduced the manual effort in writing data documents but also opened new possibilities to bridge the gap between data exploration and writing.
△ Less
Submitted 10 May, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
MARVisT: Authoring Glyph-based Visualization in Mobile Augmented Reality
Authors:
Chen Zhu-Tian,
Yijia Su,
Yifang Wang,
Qianwen Wang,
Huamin Qu,
Yingcai Wu
Abstract:
Recent advances in mobile augmented reality (AR) techniques have shed new light on personal visualization for their advantages of fitting visualization within personal routines, situating visualization in a real-world context, and arousing users' interests. However, enabling non-experts to create data visualization in mobile AR environments is challenging given the lack of tools that allow in-situ…
▽ More
Recent advances in mobile augmented reality (AR) techniques have shed new light on personal visualization for their advantages of fitting visualization within personal routines, situating visualization in a real-world context, and arousing users' interests. However, enabling non-experts to create data visualization in mobile AR environments is challenging given the lack of tools that allow in-situ design while supporting the binding of data to AR content. Most existing AR authoring tools require working on personal computers or manually creating each virtual object and modifying its visual attributes. We systematically study this issue by identifying the specificity of AR glyph-based visualization authoring tool and distill four design considerations. Following these design considerations, we design and implement MARVisT, a mobile authoring tool that leverages information from reality to assist non-experts in addressing relationships between data and virtual glyphs, real objects and virtual glyphs, and real objects and data. With MARVisT, users without visualization expertise can bind data to real-world objects to create expressive AR glyph-based visualizations rapidly and effortlessly, resha** the representation of the real world with data. We use several examples to demonstrate the expressiveness of MARVisT. A user study with non-experts is also conducted to evaluate the authoring experience of MARVisT.
△ Less
Submitted 10 May, 2024; v1 submitted 7 October, 2023;
originally announced October 2023.
-
Augmenting Static Visualizations with PapARVis Designer
Authors:
Chen Zhu-Tian,
Wai Tong,
Qianwen Wang,
Benjamin Bach,
Huamin Qu
Abstract:
This paper presents an authoring environment for augmenting static visualizations with virtual content in augmented reality. Augmenting static visualizations can leverage the best of both physical and digital worlds, but its creation currently involves different tools and devices, without any means to explicitly design and debug both static and virtual content simultaneously. To address these issu…
▽ More
This paper presents an authoring environment for augmenting static visualizations with virtual content in augmented reality. Augmenting static visualizations can leverage the best of both physical and digital worlds, but its creation currently involves different tools and devices, without any means to explicitly design and debug both static and virtual content simultaneously. To address these issues, we design an environment that seamlessly integrates all steps of a design and deployment workflow through its main features: i) an extension to Vega, ii) a preview, and iii) debug hints that facilitate valid combinations of static and augmented content. We inform our design through a design space with four ways to augment static visualizations. We demonstrate the expressiveness of our tool through examples, including books, posters, projections, wall-sized visualizations. A user study shows high user satisfaction of our environment and confirms that participants can create augmented visualizations in an average of 4.63 minutes.
△ Less
Submitted 10 May, 2024; v1 submitted 7 October, 2023;
originally announced October 2023.
-
RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios
Authors:
Chen Zhu-Tian,
Daniele Chiappalupi,
Tica Lin,
Yalong Yang,
Johanna Beyer,
Hanspeter Pfister
Abstract:
Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while kee** visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle i…
▽ More
Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while kee** visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle in dynamic scenarios with constantly moving objects. This is due to their focus on generating layouts optimal for the current moment, neglecting future moments and leading to sub-optimal or unstable layouts over time. In this work, we present RL-LABEL, a deep reinforcement learning-based method for managing the placement of AR labels in scenarios involving moving objects. RL-LABEL considers the current and predicted future states of objects and labels, such as positions and velocities, as well as the user's viewpoint, to make informed decisions about label placement. It balances the trade-offs between immediate and long-term objectives. Our experiments on two real-world datasets show that RL-LABEL effectively learns the decision-making process for long-term optimization, outperforming two baselines (i.e., no view management and a force-based method) by minimizing label occlusions, line intersections, and label movement distance. Additionally, a user study involving 18 participants indicates that RL-LABEL excels over the baselines in aiding users to identify, compare, and summarize data on AR labels within dynamic scenes.
△ Less
Submitted 10 May, 2024; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Augmenting Sports Videos with VisCommentator
Authors:
Chen Zhu-Tian,
Shuainan Ye,
Xiangtong Chu,
Haijun Xia,
Hui Zhang,
Huamin Qu,
Yingcai Wu
Abstract:
Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characteri…
▽ More
Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characterizes augmented sports videos at an element-level (what the constituents are) and clip-level (how those constituents are organized). We do so by systematically reviewing 233 examples of augmented sports videos collected from TV channels, teams, and leagues. The design space guides selection of data insights and visualizations for various purposes. Informed by the design space and close collaboration with domain experts, we design VisCommentator, a fast prototy** tool, to eases the creation of augmented table tennis videos by leveraging machine learning-based data extractors and design space-based visualization recommendations. With VisCommentator, sports analysts can create an augmented video by selecting the data to visualize instead of manually drawing the graphical marks. Our system can be generalized to other racket sports (e.g., tennis, badminton) once the underlying datasets and models are available. A user study with seven domain experts shows high satisfaction with our system, confirms that the participants can reproduce augmented sports videos in a short period, and provides insightful implications into future improvements and opportunities.
△ Less
Submitted 10 May, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Beyond Generating Code: Evaluating GPT on a Data Visualization Course
Authors:
Chen Zhu-Tian,
Chenyang Zhang,
Qianwen Wang,
Jakob Troidl,
Simon Warchol,
Johanna Beyer,
Nils Gehlenborg,
Hanspeter Pfister
Abstract:
This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization desig…
▽ More
This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization design, visual data exploration, and insight communication. The evaluation utilized GPT-3.5 and GPT-4 to complete assignments of CS171, and included a quantitative assessment based on the established course rubrics, a qualitative analysis informed by the feedback of three experienced graders, and an exploratory study of GPT's capabilities in completing border visualization tasks. Findings show that GPT-4 scored 80% on quizzes and homework, and TFs could distinguish between GPT- and human-generated homework with 70% accuracy. The study also demonstrates GPT's potential in completing various visualization tasks, such as data cleanup, interaction with visualizations, and insight communication. The paper concludes by discussing the strengths and limitations of GPT in data visualization, potential avenues for incorporating GPT in broader visualization tasks, and the need to redesign visualization education.
△ Less
Submitted 11 May, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations
Authors:
Chen Zhu-Tian,
Qisen Yang,
Jiarui Shan,
Tica Lin,
Johanna Beyer,
Haijun Xia,
Hanspeter Pfister
Abstract:
We present iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the la…
▽ More
We present iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the lack of accessible, on-demand information to resolve their confusion. To assist casual fans in watching basketball videos, we compared the game-watching behaviors of casual and die-hard fans in a formative study and developed iBall based on the fndings. iBall embeds visualizations into basketball videos using a computer vision pipeline, and automatically adapts the visualizations based on the game context and users' gaze, hel** casual fans appreciate basketball games without being overwhelmed. We confrmed the usefulness, usability, and engagement of iBall in a study with 16 casual fans, and further collected feedback from 8 die-hard fans.
△ Less
Submitted 10 May, 2024; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Sporthesia: Augmenting Sports Videos Using Natural Language
Authors:
Chen Zhu-Tian,
Qisen Yang,
Xiao Xie,
Johanna Beyer,
Haijun Xia,
Yingcai Wu,
Hanspeter Pfister
Abstract:
Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated…
▽ More
Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated using natural language, such as in commentaries, oral presentations, and articles, but usually lack visual cues. Thus, this work aims to facilitate the creation of augmented sports videos by enabling analysts to directly create visualizations embedded in videos using insights expressed in natural language. To achieve this goal, we propose a three-step approach - 1) detecting visualizable entities in the text, 2) map** these entities into visualizations, and 3) scheduling these visualizations to play with the video - and analyzed 155 sports video clips and the accompanying commentaries for accomplishing these steps. Informed by our analysis, we have designed and implemented Sporthesia, a proof-of-concept system that takes racket-based sports videos and textual commentaries as the input and outputs augmented videos. We demonstrate Sporthesia's applicability in two exemplar scenarios, i.e., authoring augmented sports videos using text and augmenting historical sports videos based on auditory comments. A technical evaluation shows that Sporthesia achieves high accuracy (F1-score of 0.9) in detecting visualizable entities in the text. An expert evaluation with eight sports analysts suggests high utility, effectiveness, and satisfaction with our language-driven authoring method and provides insights for future improvement and opportunities.
△ Less
Submitted 10 May, 2024; v1 submitted 7 September, 2022;
originally announced September 2022.
-
Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline
Authors:
Chen Zhu-Tian,
Yun Wang,
Qianwen Wang,
Yong Wang,
Huamin Qu
Abstract:
Designers need to consider not only perceptual effectiveness but also visual styles when creating an infographic. This process can be difficult and time consuming for professional designers, not to mention non-expert users, leading to the demand for automated infographics design. As a first step, we focus on timeline infographics, which have been widely used for centuries. We contribute an end-to-…
▽ More
Designers need to consider not only perceptual effectiveness but also visual styles when creating an infographic. This process can be difficult and time consuming for professional designers, not to mention non-expert users, leading to the demand for automated infographics design. As a first step, we focus on timeline infographics, which have been widely used for centuries. We contribute an end-to-end approach that automatically extracts an extensible timeline template from a bitmap image. Our approach adopts a deconstruction and reconstruction paradigm. At the deconstruction stage, we propose a multi-task deep neural network that simultaneously parses two kinds of information from a bitmap timeline: 1) the global information, i.e., the representation, scale, layout, and orientation of the timeline, and 2) the local information, i.e., the location, category, and pixels of each visual element on the timeline. At the reconstruction stage, we propose a pipeline with three techniques, i.e., Non-Maximum Merging, Redundancy Recover, and DL GrabCut, to extract an extensible template from the infographic, by utilizing the deconstruction results. To evaluate the effectiveness of our approach, we synthesize a timeline dataset (4296 images) and collect a real-world timeline dataset (393 images) from the Internet. We first report quantitative evaluation results of our approach over the two datasets. Then, we present examples of automatically extracted templates and timelines automatically generated based on these templates to qualitatively demonstrate the performance. The results confirm that our approach can effectively extract extensible templates from real-world timeline infographics.
△ Less
Submitted 6 October, 2023; v1 submitted 31 July, 2019;
originally announced July 2019.
-
LassoNet: Deep Lasso-Selection of 3D Point Clouds
Authors:
Chen Zhu-Tian,
Wei Zeng,
Zhiguang Yang,
Lingyun Yu,
Chi-Wing Fu,
Huamin Qu
Abstract:
Selection is a fundamental task in exploratory analysis and visualization of 3D point clouds. Prior researches on selection methods were developed mainly based on heuristics such as local point density, thus limiting their applicability in general data. Specific challenges root in the great variabilities implied by point clouds (e.g., dense vs. sparse), viewpoint (e.g., occluded vs. non-occluded),…
▽ More
Selection is a fundamental task in exploratory analysis and visualization of 3D point clouds. Prior researches on selection methods were developed mainly based on heuristics such as local point density, thus limiting their applicability in general data. Specific challenges root in the great variabilities implied by point clouds (e.g., dense vs. sparse), viewpoint (e.g., occluded vs. non-occluded), and lasso (e.g., small vs. large). In this work, we introduce LassoNet, a new deep neural network for lasso selection of 3D point clouds, attempting to learn a latent map** from viewpoint and lasso to point cloud regions. To achieve this, we couple user-target points with viewpoint and lasso information through 3D coordinate transform and naive selection, and improve the method scalability via an intention filtering and farthest point sampling. A hierarchical network is trained using a dataset with over 30K lasso-selection records on two different point cloud data. We conduct a formal user study to compare LassoNet with two state-of-the-art lasso-selection methods. The evaluations confirm that our approach improves the selection effectiveness and efficiency across different combinations of 3D point clouds, viewpoints, and lasso selections. Project Website: https://lassonet.github.io
△ Less
Submitted 11 May, 2024; v1 submitted 31 July, 2019;
originally announced July 2019.
-
Exploring the Design Space of Immersive Urban Analytics
Authors:
Chen Zhu-Tian,
Yifang Wang,
Tianchen Sun,
Xiang Gao,
Wei Chen,
Zhigeng Pan,
Huamin Qu,
Yingcai Wu
Abstract:
Recent years have witnessed the rapid development and wide adoption of immersive head-mounted devices, such as HTC VIVE, Oculus Rift, and Microsoft HoloLens. These immersive devices have the potential to significantly extend the methodology of urban visual analytics by providing critical 3D context information and creating a sense of presence. In this paper, we propose an theoretical model to char…
▽ More
Recent years have witnessed the rapid development and wide adoption of immersive head-mounted devices, such as HTC VIVE, Oculus Rift, and Microsoft HoloLens. These immersive devices have the potential to significantly extend the methodology of urban visual analytics by providing critical 3D context information and creating a sense of presence. In this paper, we propose an theoretical model to characterize the visualizations in immersive urban analytics. Further more, based on our comprehensive and concise model, we contribute a typology of combination methods of 2D and 3D visualizations that distinguish between linked views, embedded views, and mixed views. We also propose a supporting guideline to assist users in selecting a proper view under certain circumstances by considering visual geometry and spatial distribution of the 2D and 3D visualizations. Finally, based on existing works, possible future research opportunities are explored and discussed.
△ Less
Submitted 6 October, 2023; v1 submitted 25 September, 2017;
originally announced September 2017.