Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

Yiqiao **1, Andrew Zhao1, Yeon-Chang Lee2,
Meng Ye3, Ajay Divakaran3, Srijan Kumar1
1Georgia Institute of Technology,
2Ulsan National Institute of Science and Technology (UNIST),
3SRI International
{y**328,srijan}@gatech.edu
Abstract

We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs, offering a robust method for representing complex and massive dynamic graphs. The application of DyGETViz extends to a diverse array of domains, including ethology, epidemiology, finance, genetics, linguistics, communication studies, social studies, and international relations. Through its implementation, DyGETViz has revealed or confirmed various critical insights. These include the diversity of content sharing patterns and the degree of specialization within online communities, the chronological evolution of lexicons across decades, and the distinct trajectories exhibited by aging-related and non-related genes. Importantly, DyGETViz enhances the accessibility of scientific findings to non-domain experts by simplifying the complexities of dynamic graphs. Our framework is released as an open-source Python package for use across diverse disciplines. Our work not only addresses the ongoing challenges in visualizing and analyzing DTDG models but also establishes a foundational framework for future investigations into dynamic graph representation and analysis across various disciplines.

1 Introduction

Background

Dynamic graphs (DGs) are ubiquitous data structures present in various real-world evolving systems, such as social networks [1], linguistics [2], international relations [3], and computational finance [4]. Representing these dynamic graphs efficiently has become a crucial challenge due to their massive sizes and ever-changing nature. One compelling approach to tackle this challenge is discrete-time dynamic graph (DTDG) models [5, 6, 7], which represent a dynamic graph as a series of snapshots, each containing the nodes and edges that co-occur at particular timestamps. Despite the effectiveness of DTDG models in a wide range of graph-oriented tasks such as link prediction, node classification, and edge regression, these models usually remain opaque to researchers in terms of interpretability. The high-dimensional representations generated by these models make it difficult for users to extract and understand the intrinsic value from dynamic graphs. Currently, researchers often manually analyze the dynamic graph data, as there are no specialized tools to support this process [8, 9]. However, manual analysis of enormous dynamic graphs covering multiple timestamps can be overwhelming, and the continuously evolving nature of these graphs makes it challenging to intuitively capture both micro-level and macro-level structural shifts. For instance, in the study of international relations, aside from predicting graph attributes like future bilateral trade volumes, it is vital to understand micro-level changes such as a country’s alliance network, trade relations, and conflict dynamics, as well as macro-level trends such as the stability of the global economy amidst wars and financial crises, as inherently reflected by the high-dimensional node embeddings obtained from DTDG models.

In this case, visualization becomes a powerful tool with an intuitive and user-friendly interface for analyzing the dynamic graph embeddings of DTDG models, as it enables researchers to gain both micro-level understandings, such as predicting node states and future trajectories, and macro-level analysis, such as forecasting emerging turning points in geopolitical events. With an effective visualization framework, researchers can gain insights, identify patterns, detect anomalies, and effectively communicate their findings to both domain experts and the general public, which would be challenging to achieve solely through manual analysis.

Challenges

Develo** a visualization framework for dynamic graph embedding trajectories requires addressing the unique characteristics and challenges of DGs. The first challenge is the constant addition and deletion of nodes in DTDG. As nodes are continuously added or removed, accurately inferring dynamic embedding trajectories for new nodes and effectively incorporating them into the visualization becomes crucial yet challenging. More specifically, the continuous addition and removal of nodes create a dynamic landscape in which the proximity between nodes is in constant flux, further complicating the visualization process. The second challenge arises from the persistent evolution of node embeddings over time. Conventional visualization techniques [10, 11, 12] often rely on non-parametric methods, which can present limitations when projecting new data points onto an existing visualization space [13]. When applying such visualization techniques to each snapshot of the DTDG, the visualization layout undergoes a complete transformation, disrupting the continuity and hindering a coherent representation of embedding trajectories over time [13]. Thus, researchers will fail to observe valuable patterns in the DTDG network. Addressing this challenge is crucial for providing researchers with a clear and consistent understanding of the dynamic graph’s behavior and evolution over time.

This Work

In this work, we formally define the novel problem of dynamic graph embedding trajectories visualization to enable the analysis of discrete-time dynamic graph models. We propose DyGETViz, a novel framework for Dynamic Graph Embedding Trajectory Visualization, to address the above challenges. DyGETViz leverages recent developments in dynamic graph neural networks (GNNs) [7, 14, 15] and offers two key functionalities: visualization and analytics. The visualization module employs principles from dynamic GNNs to map high-dimensional node embeddings into lower-dimensional representations, and employs a flexible and computationally efficient approach to project node state at each timestamp onto the visualization, which is potentially scalable to datasets spanning multiple timestamps. The analytics module quantifies structural shifts in DTDGs from both micro- and macro-level. For micro-level analysis, it uses two similarity measures, namely, Jaccard index [16] and Rank-biased Overlap (RBO) [17, 18], to quantify the changes in the local topology of each node between adjacent timestamps. For macro-level analysis, it uses a novel metric, normalized average ranking change (NARC), as well as the absolute volumes of embedding movements to assess the changes in global topology. These comprehensive analytics enable researchers to gain insights into both fine-grained and large-scale changes in dynamic graphs, empowering investigations across various domains. The versatility and applicability of DyGETViz is demonstrated by our analysis on nine datasets introduced in Supp.A spanning different graph sizes and domains, including ethology, epidemiology, finance, genetics, linguistics, communication studies, social studies, and international relations.

We provide complete technical details for DyGETViz in Supp.B. Our proposed python package is available at GitHub, and the visualization for all datasets are available on our website. All the code and datasets have been made publicly available.

2 Results

2.1 Reddit Community Graphs Reveals Content Specialization, Content Diversity, and Echo Chambers

Refer to caption
Figure 1: Visualization of Reddit online communities. Each gray node in the background represents an online community (“subreddit”). The trajectories of five groups of subreddits are displayed, including a. gaming, b. sports, c. video-sharing, d. politics, and e. music. Text in the background indicates the topics that characterize each subreddit cluster. Different video-sharing communities (c.) manifest diverse levels of specialization, where communities with a narrow focus of video promotion demonstrate less mobility than general-purpose communities. DyGETViz captures a major event in r/The_Donald – its shutdown.

Online users often form communities around shared interests, beliefs, ethnicity, and geographical locations [1]. A deeper understanding of these community dynamics on platforms like Reddit, which is structured into thousands of interest-specific “subreddits”, is crucial for analyzing how user groups interact, share content, and influence one another over time. This study presents an analysis of subreddit trajectories across various topics, including gaming, sports, videos, politics, and music, with a focus on content specialization and the phenomenon of echo chambers, as shown in Fig. 1. Each subreddit’s trajectory is highlighted in a distinct color. To derive the graph embeddings, we train the model on the bipartite graph consisting of videos and subreddits, where an edge with timestamp t𝑡titalic_t exists between a video and a subreddit if the video is shared in the subreddit at t𝑡titalic_t. In the resulting graph, two nodes are close in the embedding space if they share similar videos. Each subreddit’s trajectory within the visualized graph embeddings indicates the level of content homogeneity or diversity.

Specialization in Content Sharing Across Video-Related Subreddits

The trajectories of video-sharing subreddits (Fig. 1c) demonstrate diverse levels of specialization. Subreddits with a narrow focus on promoting YouTube videos and small channels, such as r/GetMoreViewsYT, r/YouTube_startups, r/AdvertiseYourVideos, r/SmallYoutubers, and r/YouTubeSubscribeBoost, move within a confined region, illustrating a high degree of content homogeneity within these subreddits as users simultaneously spread the same videos within multiple subreddits for better visibility. In contrast, general subreddits like r/videos display a greater diversity of content, as shown from their more expansive trajectories. These findings are supported by the numeric values of Jaccard100subscriptJaccard100\mathrm{Jaccard}_{100}roman_Jaccard start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT (Fig. S3), where the overlap between the nearest neighbors of each video-related subreddit in the embedding space in adjacent timestamps is high for video-promotion subreddits. Details for the metrics are in Supplementary Sec. B.3.

Diversity and Overlap in Sports-Related Subreddits

On the other hand, for sports-related communities (Fig. 1b), subreddits with specialized topics, such as r/nba (subreddit for the National Basketball Association), r/nfl (subreddit for the National Football League), r/MMA (subreddit for mixed martial arts), r/SquaredCircle (subreddit for professional wrestling), demonstrate similar levels of movements to more general subreddits like r/sports. Notably, r/nba has a large overlap with r/nfl, indicating that these two subreddits share similar audience, posts, and content sharing pattern. Both NBA and NFL feature team-based sports, high-profile athletes, strategies, and have regular seasons followed by playoff rounds that culminate in a championship event. In case of content sharing, many videos feature athletes or moments that have transcended their respective sports and gained widespread popularity, which is appreciated by fans of both basketball (NBA) and American football (NFL). Compared to subreddits focused on videos, sports-related subreddits display more variability among their neighboring subreddits in the embedding space, as evidenced by the Jaccard100subscriptJaccard100\operatorname{Jaccard}_{100}roman_Jaccard start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT index values averaged on an annual basis (Table S4).

Trajectories of Political Subreddits Reveal Echo Chamber and Major Events

The phenomenon of echo chambers within online social networks, wherein users experience reinforcement of their ideologies through repeated interactions with like-minded peers and a narrow spectrum of information, presents a significant challenge to discourse diversity [19, 20, 21]. This pattern is notably pervasive on platforms like Reddit, where close-knit communities form around specific ideologies or interests. A pertinent example is observed in the subreddit r/WayOfTheBern, an unofficial subreddit established by Bernie Sanders’ supporters following his loss in the 2016 primary election [22]. Initially intended as a space for political discourse divergent from the mainstream Democratic Party narrative, this community has been scrutinized for its alignment and user overlap with right-leaning communities, suggesting a complex web of ideological positioning that transcends conventional political boundaries [23, 22]. The embedding trajectories in Fig. 1d reveals substantial connections between r/WayOfTheBern and r/The_Donald, another banned subreddit known for sharing misinformation and controversial content [22, 24]. These communities demonstrate converging paths that deviate from more generalized political forums like r/politics. Significantly, r/The_Donald manifests a notable divergence in the its trajectory around March 2020, coinciding with key external events such as the COVID-19 outbreak and subsequent quarantine in US major cities. The trajectory of r/The_Donald terminates at a juncture markedly distinct from its typical position in March 2020, coinciding with the outbreak of COVID-19 pandemic in the United States, the implementation of quarantine measures in major US cities, and Reddit’s decision to relegate r/The_Donald to “Restricted mode” and restricting most users from creating new posts [25]. Such a confluence of events indicates a notable divergence and deterioration characterized by the proliferation of toxic discourse within the community.

The existence and perpetuation of echo chambers underscore the complex challenges of online social networks in fostering balanced and open discourse. They not only facilitate the entrenchment of partisan beliefs by insulating users from contrary viewpoints but also serve as fertile grounds for the spread of misinformation. The observed patterns and trajectories within these communities highlight the urgent need for strategies aimed at early detection and mitigation of echo chambers, ensuring a more diverse and accurate exchange of information within these digital ecosystems.

2.2 Linguistic Reflections and Shifts in Societal Perceptions Through Lexicon Graphs

Semantic shifts in word meanings usually reveal socio-cultural changes over time, whereas the rates of semantic change vary significantly across words [2]. By leveraging word embeddings, DyGETViz effectively tracks the dynamics of lexical connotations over time. Our study uses the skip-gram with negative sampling (SGNS) embeddings [2] trained on the Google N-Gram [26] dataset. Conventional SGNS approach typically considers a fixed window of context words around the target word, and thus may not fully capture the contextual meaning of a word, especially in intricate linguistic contexts characterized by long-range dependencies or when dealing with semantically similar words with limited co-occurrence within the local context. To overcome this issue, we construct a new temporal graph. For each timestamp t𝑡titalic_t, we compute the pairwise cosine similarity between each pair of word embeddings 𝐯itsuperscriptsubscript𝐯𝑖𝑡\mathbf{v}_{i}^{t}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝐯jtsuperscriptsubscript𝐯𝑗𝑡\mathbf{v}_{j}^{t}bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and then connect each word to its k𝑘kitalic_k nearest neighbors with the highest cosine similarity. A new set of temporal embeddings is then trained on this graph. This method facilitates the extraction of high-order semantic associations between words that may not typically co-occur within the same local context, thus overcoming the limitation of the original SGNS embeddings. Empirically, we experimented with k[5,10,20,50,100]𝑘5102050100k\in[5,10,20,50,100]italic_k ∈ [ 5 , 10 , 20 , 50 , 100 ], and found that k=20𝑘20k=20italic_k = 20 yields the most meaningful semantic associations between words.

Tracing the Evolution of Socio-Economic Language in Environmental Discourse from 1950s to 1990s

To gain insights into the evolving socio-economic discourse concerning environmental concerns, we delve into the semantic trajectories of words associated with environmental protection using the HistWords-CN dataset (Fig. 2a). Starting from the 1950s, we traced diverse interpretations of words such as “environment,” which initially carried connotations related to the working environment, as indicated by their proximity to words like “team,” “mobilization,” and “state-of-the-art.” However, as we move into the 1980s and 1990s, we observe a convergence of these terms toward the region occupied by ecological-environment-related words, such as “forest,” “grassland,” “carbon dioxide,” and “nature.” This reflects the ever-growing discourse on ecology and the escalating importance attached to environmental protection. Notably, despite this convergence, the term “save” deviated from this trajectory due to its diverse meanings related to cost-saving, rent, thrift, and value. Our model thus provides an intricate understanding of the evolution of environmental discourse over time.

Refer to caption
Figure 2: a. Chronological evolution of Chinese lexicon pertaining to environmental protection. These words exhibit diverse meanings from the 1950s onward, culminating in a cohesive cluster by the 1990s. This trend underscores the growing prominence and consolidation of environmental protection concepts within the analyzed corpus. English translations are provided for reference. b. Semantic Shift in LGBTQ+ Terminology. The word “gay” was initially synonymous with “joy” and “happiness,” but its usage progressively aligns with homosexuality. This shift underscores the changing societal discourse and recognition of LGBTQ+ identities. c. Comparative analysis of semantic stability using RBO and Jaccard100subscriptJaccard100\textrm{Jaccard}_{100}Jaccard start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT reveal that words related to homosexuality exhibit substantial shifts in meaning since their inception, reflecting societal changes in perception and language. In contrast, terms solely associated with happiness show remarkable semantic stability, highlighting the enduring nature of certain lexicons despite evolving societal contexts.

2.3 Evolving Language and LGBTQ+ Acceptance: A Lexical Analysis of Societal Shifts

The power of language lies in its ability to both reflect and shape societal attitudes. In this context, we explore the linguistic landscape surrounding homosexuality, recognizing its historical significance as a mirror for societal changes.

The term “Gay” Experienced Significant Lexical Evolution in the 1970s.

During the 1970s, the LGBTQ+ rights movement in the United States experienced a transformative period characterized by increased visibility and activism [27]. However, prevailing societal attitudes during this era remained heavily influenced by traditional values and social norms, often stigmatizing homosexuality [28]. As depicted in Fig. 2, we observe a remarkable lexical shift associated with the term “gay” from the 1970s to the 1990s. The word gradually transitions away from its original connotations of happiness and fortune towards homosexuality, aligning with its etymological evolution [29].

Additionally, Table S3 provides a comprehensive view of the top five words associated with each term in the embedding space over time. The words “happy” and “delighted” retain their consistent meanings across the years, serving as constants in the lexical landscape. However, the term “gay,” once widely employed to convey happiness before the 1960s, underwent a profound transformation when they were used to refer to homosexuality in the 1970s and acquired proximity with negative words such as “forlorn” and “ugly.” This lexical shift reflects the societal struggle to grapple with evolving perceptions of homosexuality. Furthermore, LGBT-related words, including “gay,” “homosexual,” and “lesbian,” exhibit strong associations with “clubs” and “dance” during the 1970s and 1980s. This phenomenon corresponds to the development of a distinctive LGBTQ+ culture and language during this era. Bars and dance clubs emerged as vital meeting places for the LGBTQ+ community, providing safe spaces for socialization, self-expression, and the formation of supportive networks [30]. It is crucial to acknowledge that the portrayal of LGBTQ+ characters and issues in popular culture largely perpetuated negative stereotypes and discriminatory portrayals during the examined period. This further entrenched negative attitudes within the general public, making societal acceptance and understanding a complex and arduous journey [31, 32].

By meticulously tracing these linguistic transformations and contextualizing them within historical and societal frameworks, our study contributes to a deeper understanding of the intricate relationship between language, societal attitudes, and the ongoing struggle for LGBTQ+ acceptance.

Refer to caption
Figure 3: a. Overview of the embedding trajectories on UN Comtrade [33]. Each country is labeled with its nominal GDP rankings in 2017 [34] (e.g.,“USA (1)”). Large- and middle-scale economies (e.g., USA, UK, Russia, Netherlands) with higher GDP rankings and intensive trade relations form a distinct cluster, while lower-ranked economies (e.g., Tajikistan, Uzbekistan, Jordan) exhibit individual clusters; b. Detailed view of a. The three country groups according to IMF [35], Major Advanced Economies (MAE), Other Advanced Economies (OAE), and Emerging and Develo** Economies (EDE), form distinct visual partitions. The trajectories of advanced economies with economic stability, such as the USA, UK, and Germany, remain in a constrained region, while countries that have experienced rapid growth or drastic economic instability, such as Japan, China, and Russia, manifest more diverse trajectories; c. Fluctuations in JaccardnsubscriptJaccard𝑛\operatorname{Jaccard}_{n}roman_Jaccard start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and RBO align with major economic events in history. d. Average RBO of each country over the period 1988-2022. The x-axis describes the total GDP on a logarithmic scale. Node colors indicate country types, and node sizes represent the population.

2.4 Unveiling Global Trade Dynamics: Insights from UN Comtrade Export Data.

In the field of economics, understanding, modeling, and predicting international trade plays a crucial role in hel** economists and policymakers navigate the challenges and opportunities arising from globalization, such as financial crises. In this study, we analyze international trade dynamics using export data from the United Nations Commodity Trade Statistics Database [33]. To capture the economic status and trading partnerships of countries, we perform linear regression on the logarithmic values of a country’s gross exports and the bilateral trade volumes, and employ the joint training objective with λ1=λ2=0.1subscript𝜆1subscript𝜆20.1\lambda_{1}=\lambda_{2}=0.1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1 in Equation 1.

The resulting visualization in Fig. 3a offers a comprehensive representation of the international trade landscape. Advanced Economies, as classified by the International Monetary Fund (IMF)111https://www.imf.org/en/Publications/WEO/weo-database/2023/April/groups-and-aggregates, form distinct clusters located primarily in the upper right region, while countries with lower trade volumes and those positioned on the periphery of international trade form separate clusters in the left and lower regions. In addition, Fig. 3b provides a clearer illustration of the distinct visual partitions among the three country groups defined by IMF [35]: Major Advanced Economies (MAE)222IMF defines “Major Advanced Economies” as the G7 countries, including Canada, France, Germany, Italy, Japan, the UK and the USA, Other Advanced Economies (OAE), and Emerging and Develo** Economies (EDE). This spatial arrangement reflects the different degrees of trade engagement of each country within the global trade network.

Dynamic Graph Embedding Trajectories of Individual Countries Reveal Development and Stability Patterns of Key Economies

In Fig. 3b, the trajectories of individual countries reveal distinct patterns of economic development and stability. The United States, the United Kingdom, and Germany333Germany has been listed in UN Comtrade as a single sovereign state since 1991, following German reunification in October 1990. demonstrate relatively stable and consistent trading status throughout the examined period (1988-2022). On the other hand, China’s trajectory moves between MAE and OAE, reflecting its prolonged period of economic development characterized by comprehensive domestic reforms, the lifting of price controls, and the liberalization of trade policies [36, 37]. Russia exhibits significant movements between OAE and EDE. Its trajectory predominantly shifted towards the EDE region during the period 1992-1998, coinciding with a substantial 40% contraction in GDP [3]. Starting from the early 2000s, Russia moves towards the region occupied by OAEs, including the four middle-sized developed countries Switzerland, Belgium, Sweden, and the Netherlands, which indicates a period of economic recovery characterized by greater trade volumes. Despite its status as an MAE, Japan has experienced economic development with significant fluctuations. The country encountered unique obstacles such as the Japanese asset price bubble (1990-1992) whose impact has lasted for more than a decade [38, 39]. We further use the Jaccard index [16] and Rank-biased Overlap (RBO) [17, 18] to measure the macro-level changes over time. Detailed calculations of these metrics are in Supplementary Sec. B.3. As reflected in Fig. 3c, the RBO and Jaccard5subscriptJaccard5\mathrm{Jaccard}_{5}roman_Jaccard start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT for Japan plummeted during this period compared to other countries, indicating a period of instability in its economic status.

Trade Resilience and Volatility during Global Economic Crises

From Fig. 3c, we observe three periods of significant fluctuations in RBO and Jaccard5subscriptJaccard5\mathrm{Jaccard}_{5}roman_Jaccard start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT for most countries, indicating significant changes in their trading status. The first period is 1997 - 2003, which corresponds to the 1997 Asian Financial Crisis and the dot-com bubble when investor confidence declined worldwide. For most countries, the recovery from the financial crisis in 1998–1999 was rapid [40]. For example, China demonstrates quick movements towards and away from the EDE region (Fig. 3b) around 1998. These two events had global ripple effects. The dot-com bubble, during which many large-scale Internet and communication companies failed and shut down, has a more far-reaching effect. As the epicenter of the bubble, the US experienced the most drastic fluctuation in its trading status, as shown by its decline in RBO and Jaccard5subscriptJaccard5\operatorname{Jaccard}_{5}roman_Jaccard start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT [16] in Fig. 3c (Refer to appendix for ). Similarly, the Great Recession in the 2008s and the COVID-19 also caused fluctuations in RBO and Jaccard5subscriptJaccard5\text{Jaccard}_{5}Jaccard start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT.

2.5 Dynamic Graph Analysis of Gene Expression Trajectories Reveals Key Patterns in Aging

Dynamic graphs are vital for identifying anomalous genes and genetic variations that significantly impact disease development [41] and human aging process [4]. DyGETViz enables researchers to effectively pinpoint genes with unusual patterns or interactions, facilitating a deeper understanding of aging-related diseases, the genetic mechanisms underlying the aging process, and potential treatments.

Refer to caption
Figure 4: a/d. t-SNE visualization of the Aging dataset [42] and the DGraph dataset [4]. Red dots represent aging-related genes in the genetic network (a) and fraudsters in the financial network (d). Gray dots represent normal nodes (non-aging-related genes and normal users), respectively. b/e. embedding trajectories of 10 anomalous nodes (in warm colors) and 10 normal nodes (in cold colors), respectively c/f. The kernel density estimate (KDE) plot for the trajectories. Darker colors indicate higher node densities.

Characterizing Structural and Temporal Differences in Gene Expression During Aging

We examine structural differences, neighbor distributions, and temporal dynamics between aging-related and non-aging-related genes using human gene expression data at 37 differnt ages, ranging from 20 to 99. The t-SNE projection in Fig. 4a shows that genes directly related to aging (red dots) have distinct distributions from normal genes (gray dots). We further analyze the trajectories of 10 aging-related and 10 non-aging-related genes, and plot their embedding trajectories in Fig. 4b. From a dynamic graph perspective, the aging-related genes are characterized by distinct embedding trajectories, which are mainly located on the left side of the plot. Such distinctions are reinforced by the kernel density estimation (KDE) plot in Fig. 4c.

Application of DyGETViz for Predicting Aging-Related Gene Behavior

We randomly select 6 genes commonly altered during the human aging process as identified in previous research [43]. These genes experience frequent changes due to their roles in cellular processes, although there is insufficient evidence linking them directly to aging. These genes are categorized as overexpressed (Gene 306, 1520, and 2212) and underexpressed genes (Gene 1281, 1277, and 1287) 444https://genomics.senescence.info/genes/microarray.php. As shown in Fig. 4c, the orange trajectories representing overexpressed genes typically transition between regions associated with aging and non-aging, suggesting that these genes can potentially induce or accelerate the aging process, despite the absence concrete evidence. Meanwhile, the purple trajectories representing underexpressed genes mostly remaining within non-aging regions, suggesting that these genes are less likely to be involved in the aging process.

2.6 Challenges in Distinguishing Fraudulent and Legitimate Behaviors in Financial Networks

Dynamic graphs can be used in financial networks to detect and flag users engaged in fraudulent behaviors [4]. Accurate identification of fraudulent users can facilitate timely intervention and prevent financial loss. As shown in Fig. 4e, the distinction between fraudsters and normal users appears less pronounced, as both groups exhibit trajectories widely dispersed across the plot. These observations highlight the challenge of distinguishing between fraudulent and normal users. In real-world scenarios, fraudulent users possess a remarkable ability to camouflage their activities, often mirroring the behaviors of genuine users. This challenge is further exemplified in Fig. 4f, where the KDE plot depicts the convergence of their trajectories, underscoreing the complexity in accurately identifying and differentiating fraudulent activities from legitimate ones.

2.7 Modeling Social Dynamics in Ant Colonies on Animal Activity Graphs

Animals exhibit intricate and efficient social organizations. For example, ant colonies demonstrate as well-defined organizational hierarchy and role differentiation among worker ants [9]. Roles within these societies include nurses, responsible for the care of the brood and the queen; cleaners, who ensure colony cleanliness and waste disposal; and foragers, tasked with acquiring food resources from outside the colony. Dynamic graph modeling is utilized to describe these behaviors and the evolution of social roles within animal groups. Our model provides a clear interpretation of the trajectories of role-based behaviors, as inferred from the embedding model.

Trajectories of Different Ant Roles Reveal Distinct Spatial Organizations

Fig. S5 illustrates these findings, showing that the movement patterns of nurses are generally restricted to areas near the queen, reflecting their frequent interactions. Conversely, foragers are typically found in remote areas, aligning with their external foraging activities and minimal contact with the queen. The spatial distribution and movements of these roles over time reveal distinct patterns: nurses and foragers maintain localized activity areas, whereas cleaners exhibit movement patterns intersecting with those of nurses due to their intermediary tasks.

Capturing Role Transitions in Ant Behaviors

DyGETViz captures the transition of individuals between roles, a phenomenon supported by existing literature [9]. For example, the trajectories of certain ants (e.g., Ant29 and Ant242) shift from nursing towards cleaning roles over time, indicating a natural progression as they age. This dynamic is effectively represented in our models providing insight into the adaptive behaviors within ant colonies.

3 Conclusion and Future Works

In this work, we formally define the problem of dynamic graph embedding trajectories visualization, and introduce DyGETViz, a novel framework to effectively address the problem. Empirical evaluation on 9 real-world datasets demonstrates the broad application of DyGETViz and provides significant insights.

Looking forward, there are multiple promising directions for further research. An immediate area of interest is the development of more refined methodologies for assessing the quality and efficacy of the visualizations generated. This could involve the creation of metrics and evaluation protocols that better capture the utility and interpretability of visual outputs in practical scenarios. Additionally, it is imperative to investigate the potential of DyGETViz to be adapted or enhanced to support a wider array of visualization paradigms and representations. Such explorations could extend its relevance to other data types and structures beyond graphs, thereby accommodating the dynamic and diverse needs of modern data visualization.

Acknowledgment

This research/material is based upon work supported in part by NSF grants CNS-2154118, IIS-2027689, ITE-2137724, ITE-2230692, CNS2239879, Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112290102 (subcontract No. PO70745), and funding from Microsoft, Google, and Adobe Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the position or policy of DARPA, DoD, SRI International, NSF and no official endorsement should be inferred. We thank the reviewers for their comments.

References

  • [1] Yiqiao **, Yeon-Chang Lee, Kartik Sharma, Meng Ye, Karan Sikka, Ajay Divakaran, and Srijan Kumar. Predicting information pathways across online communities. In KDD, 2023.
  • [2] William L Hamilton, Jure Leskovec, and Dan Jurafsky. Diachronic word embeddings reveal statistical laws of semantic change. In ACL, pages 1489–1501, 2016.
  • [3] Anderson Monken, Flora Haberkorn, Munisamy Gopinath, Laura Freeman, and Feras A Batarseh. Graph neural networks for modeling causality in international trade. In FLAIRS, volume 34, 2021.
  • [4] Xuanwen Huang, Yang Yang, Yang Wang, Chun** Wang, Zhisheng Zhang, Jiarong Xu, Lei Chen, and Michalis Vazirgiannis. Dgraph: A large-scale financial dataset for graph anomaly detection. NIPS, 35:22765–22777, 2022.
  • [5] Jiaxuan You, Tianyu Du, and Jure Leskovec. Roland: graph learning framework for dynamic graphs. In KDD, pages 2358–2366, 2022.
  • [6] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao Schardl, and Charles Leiserson. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In AAAI, volume 34, pages 5363–5370, 2020.
  • [7] Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. Structured sequence modeling with graph convolutional recurrent networks. In ICONIP, pages 362–373. Springer, 2018.
  • [8] Siwei Li, Zhiyan Zhou, Anish Upadhayay, Omar Shaikh, Scott Freitas, Haekyu Park, Zijie J Wang, Susanta Routray, Matthew Hull, and Duen Horng Chau. Argo lite: Open-source interactive graph exploration and visualization in browsers. In CIKM, pages 3071–3076, 2020.
  • [9] Danielle P Mersch, Alessandro Crespi, and Laurent Keller. Tracking individuals shows spatial fidelity is a key regulator of ant social organization. Science, 340(6136):1090–1093, 2013.
  • [10] Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.
  • [11] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. JMLR, 9(11), 2008.
  • [12] Nicola Pezzotti, Thomas Höllt, B Lelieveldt, Elmar Eisemann, and Anna Vilanova. Hierarchical stochastic neighbor embedding. In Computer Graphics Forum, volume 35, pages 21–30. Wiley Online Library, 2016.
  • [13] Sungtae An, Shenda Hong, and Jimeng Sun. Viva: semi-supervised visualization via variational autoencoders. In ICDM, pages 22–31. IEEE, 2020.
  • [14] **yin Chen, Xueke Wang, and Xuanheng Xu. Gc-lstm: Graph convolution embedded lstm for dynamic network link prediction. Applied Intelligence, pages 1–16, 2022.
  • [15] Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. Dysat: Deep neural representation learning on dynamic graphs via self-attention networks. In WSDM, pages 519–527, 2020.
  • [16] Paul Jaccard. The distribution of the flora in the alpine zone. 1. New phytologist, 11(2):37–50, 1912.
  • [17] William Webber, Alistair Moffat, and Justin Zobel. A similarity measure for indefinite rankings. TOIS, 28(4):1–38, 2010.
  • [18] Sejoon Oh, Berk Ustun, Julian McAuley, and Srijan Kumar. Rank list sensitivity of recommender systems to interaction perturbations. In CIKM, pages 1584–1594, 2022.
  • [19] Corrado Monti, Giuseppe Manco, Cigdem Aslay, and Francesco Bonchi. Learning ideological embeddings from information cascades. In CIKM, pages 1325–1334, 2021.
  • [20] Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. Echo chambers: Emotional contagion and group polarization on facebook. Scientific reports, 6(1):37825, 2016.
  • [21] Matteo Cinelli, Gianmarco De Francisci Morales, Alessandro Galeazzi, Walter Quattrociocchi, and Michele Starnini. The echo chamber effect on social media. PNAS, 118(9):e2023301118, 2021.
  • [22] James Varney. Prominent pro-sanders subreddit wayofthebern aims to divide democrats, says social media analyst. The Washington Times, 2 2019.
  • [23] Redditpedia Wiki. Subreddit statistics of user overlap, 2023.
  • [24] Marcus Mann, Diana Zulli, Jeremy Foote, Emily Ku, and Emily Primm. Unsorted significance: Examining potential pathways to extreme political beliefs and communities on reddit. Socius, 9:23780231231174823, 2023.
  • [25] Elizabeth Timberg, Craig; Dwoskin. Reddit closes long-running forum supporting president trump after years of policy violations. The Washington Post, 2020.
  • [26] Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberman, Jon Orwant, Will Brockman, and Slav Petrov. Syntactic annotations for the google books ngram corpus. In ACL, pages 169–174, 2012.
  • [27] Patrick Moore. Beyond shame: Reclaiming the abandoned history of radical gay sexuality. Beacon Press, 2004.
  • [28] Stephan Cohen. The Gay Liberation Youth Movement in New York:’an army of lovers cannot fail’. Routledge, 2007.
  • [29] Adam Jatowt and Kevin Duh. A framework for analyzing semantic change of words across time. In JCDL, pages 229–238. IEEE, 2014.
  • [30] Michael Anthony Lusby. Ghent gayland: A case study of the gay and lesbian community and media of norfolk, virginia. Master’s thesis, College of William & Mary, 2011.
  • [31] Lauren B McInroy and Shelley L Craig. Perspectives of lgbtq emerging adults on the depiction and impact of lgbtq media representation. Journal of youth studies, 20(1):32–46, 2017.
  • [32] Kevin L Nadal, Chassitty N Whitman, Lindsey S Davis, Tanya Erazo, and Kristin C Davidoff. Microaggressions toward lesbian, gay, bisexual, transgender, queer, and genderqueer people: A review of the literature. The journal of sex research, 53(4-5):488–508, 2016.
  • [33] UN Comtrade. The united nations commodity trade statistics database. https://comtrade.un.org/, 2010.
  • [34] Worldometer. Gdp by country (2017), 2023.
  • [35] IMF. Country composition of weo groups, 2023.
  • [36] John William Longworth, Colin G Brown, and Scott A Waldron. Beef in china: agribusiness opportunities and challenges. The China Journal, 2001.
  • [37] Justin Yifu Lin, Fang Cai, and Zhou Li. The China miracle: Development strategy and economic reform (Revised Edition). The Chinese University of Hong Kong Press, 2004.
  • [38] Thayer Watkins. Japan’s bubble economy, 1999.
  • [39] Kunio Okina, Masaaki Shirakawa, and Shigenori Shiratsuka. The asset price bubble and monetary policy: Japan’s experience in the late 1980s and the lessons. Monetary and Economic Studies (special edition), 19(2):395–450, 2001.
  • [40] Steven Radelet, Jeffrey D Sachs, Richard N Cooper, and Barry P Bosworth. The east asian financial crisis: diagnosis, remedies, prospects. Brookings papers on Economic activity, 1998(1):1–90, 1998.
  • [41] Leman Akoglu, Hanghang Tong, and Danai Koutra. Graph based anomaly detection and description: a survey. TKDE, 29:626–688, 2015.
  • [42] Qi Li, Khalique Newaz, and Tijana Milenković. Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC bioinformatics, 22(1):1–26, 2021.
  • [43] Robi Tacutu, Thomas Craig, Arie Budovsky, Daniel Wuttke, Gilad Lehmann, Dmitri Taranukha, Joana Costa, Vadim E Fraifeld, and Joao Pedro De Magalhaes. Human ageing genomic resources: integrated databases and tools for the biology and genetics of ageing. Nucleic acids research, 41(D1):D1027–D1033, 2012.
  • [44] B KLIMT. Introducing the enron corpus. In CEAS, 2004.
  • [45] Benedek Rozemberczki, Paul Scherer, Oliver Kiss, Rik Sarkar, and Tamas Ferenci. Chickenpox cases in hungary: a benchmark dataset for spatiotemporal signal processing with graph neural networks. arXiv preprint arXiv:2102.08100, 2021.
  • [46] Khalique Newaz and Tijana Milenković. Inference of a dynamic aging-related biological subnetwork via network propagation. TCBB, 19(2):974–988, 2020.
  • [47] Terrence Szymanski. Temporal word analogies: Identifying lexical replacement with diachronic word embeddings. In ACL, pages 448–453, 2017.
  • [48] Anant Dadu, Vipul K Satone, Rachneet Kaur, Mathew J Koretsky, Hirotaka Iwaki, Yue A Qi, Daniel M Ramos, Brian Avants, Jacob Hesterman, Roger Gunn, et al. Application of aligned-umap to longitudinal biomedical studies. Patterns, 4(6), 2023.
  • [49] Leland McInnes, John Healy, Nathaniel Saul, and Lukas Großberger. Umap: Uniform manifold approximation and projection. JOSS, 3(29), 2018.
  • [50] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323–2326, 2000.
  • [51] Joshua B Tenenbaum, Vin de Silva, and John C Langford. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
  • [52] C Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72–101, 1904.
  • [53] Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
  • [54] Yiqiao **, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, and **dong Wang. Agentreview: Exploring peer review dynamics with llm agents. arXiv:2406.12708, 2024.
  • [55] Utkarsh Mahadeo Khaire and R. Dhanalakshmi. Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, 34(4):1060–1073, 2022.
  • [56] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.
  • [57] Yiqiao **, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, and Wei Wang. Code recommendation for open source software developers. In Web Conference, 2023.
  • [58] Srijan Kumar, Xikun Zhang, and Jure Leskovec. Predicting dynamic embedding trajectory in temporal interaction networks. In KDD, pages 1269–1278, 2019.
  • [59] Yiqiao **, Xiting Wang, Ruichao Yang, Yizhou Sun, Wei Wang, Hao Liao, and Xing Xie. Towards fine-grained reasoning for fake news detection. In AAAI, volume 36, pages 5746–5754, 2022.
  • [60] Ruichao Yang, Xiting Wang, Yiqiao **, Chaozhuo Li, Jianxun Lian, and Xing Xie. Reinforcement subgraph reasoning for fake news detection. In KDD, pages 2253–2262, 2022.
  • [61] Benedek Rozemberczki, Paul Scherer, Yixuan He, George Panagopoulos, Alexander Riedel, Maria Astefanoaei, Oliver Kiss, Ferenc Beres, Guzmán López, Nicolas Collignon, et al. Pytorch geometric temporal: Spatiotemporal signal processing with neural machine learning models. In CIKM, pages 4564–4573, 2021.
  • [62] Warren S Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.
  • [63] Seongmin Lee, Sadia Afroz, Haekyu Park, Zijie J Wang, Omar Shaikh, Vibhor Sehqal, Ankit Peshin, and Duen Horng Chau. Explaining website reliability by visualizing hyperlink connectivity. In 2022 IEEE Visualization and Visual Analytics (VIS), pages 26–30. IEEE, 2022.
  • [64] Kevin Li, Haoyang Yang, Evan Montoya, Anish Upadhayay, Zhiyan Zhou, Jon Saad-Falcon, and Duen Horng Chau. Visual exploration of literature with argo scholar. In CIKM, pages 4912–4916, 2022.
  • [65] Victor Chomel, Nathanaël Cuvelle-Magar, Maziyar Panahi, and David Chavalarias. Polarization identification on multiple timescale using representation learning on temporal graphs in eulerian description. In NeurIPS 2022 Temporal Graph Learning Workshop, 2022.
  • [66] Jitesh Shetty and Jafar Adibi. Discovering important nodes through graph entropy the case of enron email database. In Proceedings of the 3rd international workshop on Link discovery, pages 74–81, 2005.
  • [67] Matthew W Seeger and Robert R Ulmer. Explaining enron: Communication and responsible leadership. Management Communication Quarterly, 17(1):58–84, 2003.
  • [68] Cees BM Van Riel and Charles J Fombrun. Essentials of corporate communication: Implementing practices for effective reputation management. Routledge, 2007.
  • [69] JoAnne Yates. Control through communication: The rise of system in American management, volume 6. JHU Press, 1993.
  • [70] Linjuan Rita Men. Strategic internal communication: Transformational leadership, communication channels, and employee satisfaction. Management communication quarterly, 28(2):264–284, 2014.
  • [71] Zoltán Kovács, Zsolt Jenő Farkas, Tamás Egedy, Attila Csaba Kondor, Balázs Szabó, József Lennert, Dorián Baka, and Balázs Kohán. Urban sprawl and land conversion in post-socialist cities: The case of metropolitan budapest. Cities, 92:71–81, 2019.
  • [72] Wadie Skaf, Arzu Tosayeva, and Dániel T Várkonyi. Towards automatic forecasting: Evaluation of time-series forecasting models for chickenpox cases estimation in hungary. In ISDA, pages 1–10. Springer, 2022.

A Dataset Introduction

Datasets.

We used 9 publicly available datasets spanning 8 different domains to demonstrate DyGETViz’s wide applicability across all of these subject areas. Table S1 provides the statistics of the nine datasets.

  • Reddit [1] encompasses YouTube videos shared across 29,461 subreddits over a five-year period, from January 2018 to December 2022. The dataset forms a bipartite graph with each node representing a video or a subreddit. Each edge in the graph indicates a video being shared in a subreddit, and its weight is determined by the frequency of sharing.

  • Enron [44] includes the email communication history of Enron Corporation from June 1999 to December 2001. Each node represents an employee and each edge represents an email between them.

  • UN Comtrade555https://comtradeplus.un.org/ (United Nations Comtrade database) [33] offers extensive global annual trade statistics. Our analysis focuses on the annual export data from 1988 to 2022. Nodes represent countries and edges represent the logarithmic values of the annual export volumes between countries.

  • HistWords-EN. The HistWords embeddings is derived from the diachronic word embeddings trained using SGNS (Skip-Gram with Negative Sampling) on the Google N-Gram dataset [26], which uses English documents from the 1800s to the 1990s as the corpus. Each node represents a word, and each edge represents word similarity. The detailed dataset construction process is described in Section 2.2

  • HistWords-CN [2] is trained in the same manner as HistWords-EN using SGNS vectors of Chinese words from the Google N-Gram dataset over the period of 1950s to 1990s.

  • Chickenpox [45] features the weekly chickenpox cases in Hungary between January 2005 and January 2015. Nodes represent the counties, and edges are constructed based on geographical locations — an edge exists between two counties if they are adjacent. The training objective is to predict the number of weekly cases in each county.

  • Ant [9] features ants behaviors over a 41 days’ period. Nodes represent ants, and edges represent interactions between two ants.

  • DGraph [4] is a finance dataset about fraudster detection. Nodes represents Finvolution users, which fall under 3 categories — normal users, fraudsters, and background users (users who are not detection targets due to insufficient borrowing behaviors). An edge from one user to the other means that the user regards the other one as the emergency contact. We randomly sampled a subgraph with 100,000 nodes.

  • Aging [42] provides the human gene expression data at 37 ages spanning between 20 and 99 years. For each age, an aging-specific graph snapshot is constructed, in which nodes represent genes and edges represent interactions between genes. The edge weight represents the strength of the protein-protein interactions (PPIs) between two genes [46].

Table S1: Statistics of our datasets. “Interval” indicates the time interval for each snapshot. ‘\’ indicates that the snapshot interval is not constant.
Datasets Domains #Nodes #Edges #Snapshots Interval
Reddit [1] Social Studies 4,303,032 27,836,000 60 1 month
DGraph [4] Finance 100,000 119,352 17 1 week
HistWords-EN [2] Linguistics 100,000 14,539,140 20 10 years
HistWords-CN [2] Linguistics 29,701 763,100 5 10 years
Aging [42] Genetics 8,938 71,800 37 \
Enron [44] Communication Studies 143 22,784 16 2 months
Ant [9] Ethology 113 111,578 41 1 day
UN Comtrade [33] International Relations 107 162,322 35 1 year
Chickenpox Epidemiology 20 102 517 1 week

B Method

Refer to caption
Figure S1: Our proposed DyGETViz framework.

In this section, we introduce our novel and computationally efficient framework DyGETViz for visualizing and analyzing dynamic graph embedding trajectories. Our framework effectively addresses the challenges associated with DGs mentioned in Section 1, including continuously evolving node embeddings and constant node addition and deletion. Figure S1 and Algorithm S1 describe the workflow and the pseudocode of DyGETViz, respectively.

B.1 Embedding Training

Given the sequence of graph snapshots {Gt}subscript𝐺𝑡\{G_{t}\}{ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }, DyGETViz first learns a DTDG model using the joint training objective \mathcal{L}caligraphic_L, which is the linear combination of the link prediction loss linktsuperscriptsubscriptlink𝑡\mathcal{L}_{\text{link}}^{t}caligraphic_L start_POSTSUBSCRIPT link end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, node-level loss nodetsuperscriptsubscriptnode𝑡\mathcal{L}_{\text{node}}^{t}caligraphic_L start_POSTSUBSCRIPT node end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and edge-level loss edgetsuperscriptsubscriptedge𝑡\mathcal{L}_{\text{edge}}^{t}caligraphic_L start_POSTSUBSCRIPT edge end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

\displaystyle\mathcal{L}caligraphic_L =t[1,T]λ1linkt+λ2nodet+λ3edget.absentsubscript𝑡1𝑇subscript𝜆1superscriptsubscriptlink𝑡subscript𝜆2superscriptsubscriptnode𝑡subscript𝜆3superscriptsubscriptedge𝑡\displaystyle=\sum_{t\in[1,T]}\lambda_{1}\mathcal{L}_{\text{link}}^{t}+\lambda% _{2}\mathcal{L}_{\text{node}}^{t}+\lambda_{3}\mathcal{L}_{\text{edge}}^{t}.= ∑ start_POSTSUBSCRIPT italic_t ∈ [ 1 , italic_T ] end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT link end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT node end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT edge end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . (1)

Here, nodetsuperscriptsubscriptnode𝑡\mathcal{L}_{\text{node}}^{t}caligraphic_L start_POSTSUBSCRIPT node end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT (resp. edgetsuperscriptsubscriptedge𝑡\mathcal{L}_{\text{edge}}^{t}caligraphic_L start_POSTSUBSCRIPT edge end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT) can be defined as the mean squared error or cross-entropy loss between the predicted node (resp. edge) attributes and the ground-truth, depending on the problem formulation (e.g., linear regression or node/edge classification). λ1,λ2,λ3subscript𝜆1subscript𝜆2subscript𝜆3\lambda_{1},\lambda_{2},\lambda_{3}\in\mathbb{R}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∈ blackboard_R denote hyperparameters that control the weights of each loss term. This process generates temporal node embeddings {𝐕t}t=1Tsuperscriptsubscriptsuperscript𝐕𝑡𝑡1𝑇\{\mathbf{V}^{t}\}_{t=1}^{T}{ bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT across T𝑇Titalic_T timestamps (Line 3).

B.2 Embedding Visualization

Algorithm S1 DTDG embedding visualization. {𝐕t}superscript𝐕𝑡\{\mathbf{V}^{t}\}{ bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } is the set of temporal embedding matrix, where 𝐕t|Vt|×dsuperscript𝐕𝑡superscriptsuperscript𝑉𝑡𝑑\mathbf{V}^{t}\in\mathbb{R}^{|V^{t}|\times d}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | × italic_d end_POSTSUPERSCRIPT. 𝐗|V|×d𝐗superscriptsuperscript𝑉𝑑\mathbf{X}\in\mathbb{R}^{|V^{\prime}|\times d}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | × italic_d end_POSTSUPERSCRIPT is the static embedding matrix for the anchor nodes. sim():d×d:simsuperscript𝑑𝑑\operatorname{sim}(\cdot):\mathbb{R}^{d\times d}\rightarrow\mathbb{R}roman_sim ( ⋅ ) : blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT → blackboard_R is a similarity measure. 𝒩(i,k,t)𝒩𝑖𝑘𝑡\mathcal{N}(i,k,t)caligraphic_N ( italic_i , italic_k , italic_t ) denotes the k𝑘kitalic_k nearest neighbors of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time t𝑡titalic_t in the embedding space. Agg()Agg\operatorname{Agg}(\cdot)roman_Agg ( ⋅ ) is an aggregation function. α𝛼\alphaitalic_α is an interpolation factor. 𝐙={𝐳i}i=1|V||V|×p𝐙superscriptsubscriptsubscript𝐳𝑖𝑖1superscript𝑉superscriptsuperscript𝑉𝑝\mathbf{Z}=\{\mathbf{z}_{i}\}_{i=1}^{|V^{\prime}|}\in\mathbb{R}^{|V^{\prime}|% \times p}bold_Z = { bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | × italic_p end_POSTSUPERSCRIPT is the p𝑝pitalic_p-dimensional projection of nodes viVsubscript𝑣𝑖superscript𝑉v_{i}\in V^{\prime}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.
1:{𝒢t}superscript𝒢𝑡\{\mathcal{G}^{t}\}{ caligraphic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }.
2:Dynamic Graph Visualization 𝒫𝒫\mathcal{P}caligraphic_P.
3:Train a DTDG model using objective \mathcal{L}caligraphic_L and derive {𝐕t}superscript𝐕𝑡\{\mathbf{V}^{t}\}{ bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } \triangleright Discrete-Time Dynamic Graph Model Training
4:Compute 𝐗𝐗\mathbf{X}bold_X for viVsubscript𝑣𝑖superscript𝑉v_{i}\in V^{\prime}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
5:𝐙=f(𝐗)𝐙𝑓𝐗\mathbf{Z}=f(\mathbf{X})bold_Z = italic_f ( bold_X ) \triangleright Compute p𝑝pitalic_p-dimensional projection of Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
6:Create 𝒫𝒫\mathcal{P}caligraphic_P and project 𝐙={𝐳i}i=1|V|𝐙superscriptsubscriptsubscript𝐳𝑖𝑖1superscript𝑉\mathbf{Z}=\{\mathbf{z}_{i}\}_{i=1}^{|V^{\prime}|}bold_Z = { bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT for viVsubscript𝑣𝑖superscript𝑉v_{i}\in V^{\prime}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT onto 𝒫𝒫\mathcal{P}caligraphic_P
7:for t1,,T𝑡1𝑇t\leftarrow 1,\ldots,Titalic_t ← 1 , … , italic_T do \triangleright Cross-Time Alignment
8:     for viVtsubscript𝑣𝑖superscript𝑉𝑡v_{i}\in V^{t}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT do
9:         for vjV{vi}subscript𝑣𝑗superscript𝑉subscript𝑣𝑖v_{j}\in V^{\prime}\setminus\{v_{i}\}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∖ { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } do
10:              sijtsim(𝐯it,𝐯jt)superscriptsubscript𝑠𝑖𝑗𝑡simsuperscriptsubscript𝐯𝑖𝑡superscriptsubscript𝐯𝑗𝑡s_{ij}^{t}\leftarrow\operatorname{sim}(\mathbf{v}_{i}^{t},\mathbf{v}_{j}^{t})italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← roman_sim ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , bold_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) \triangleright Embedding Similarity
11:         end for
12:         Compute 𝒩(i,k,t)𝒩𝑖𝑘𝑡\mathcal{N}(i,k,t)caligraphic_N ( italic_i , italic_k , italic_t ) according to {sijt}superscriptsubscript𝑠𝑖𝑗𝑡\{s_{ij}^{t}\}{ italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }
13:         𝐳^itAgg({𝐳j|vj𝒩(i,k,t)})superscriptsubscript^𝐳𝑖𝑡Aggconditional-setsubscript𝐳𝑗subscript𝑣𝑗𝒩𝑖𝑘𝑡\mathbf{\hat{z}}_{i}^{t}\leftarrow\operatorname{Agg}(\{\mathbf{z}_{j}|v_{j}\in% \mathcal{N}(i,k,t)\})over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← roman_Agg ( { bold_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ caligraphic_N ( italic_i , italic_k , italic_t ) } ) \triangleright Aggregation
14:         𝐳it={α𝐳i+(1α)𝐳^itif viV𝐳^itotherwisesuperscriptsubscript𝐳𝑖𝑡cases𝛼subscript𝐳𝑖1𝛼superscriptsubscript^𝐳𝑖𝑡if subscript𝑣𝑖superscript𝑉superscriptsubscript^𝐳𝑖𝑡otherwise\mathbf{z}_{i}^{t}=\begin{cases}\alpha\cdot\mathbf{z}_{i}+(1-\alpha)\cdot% \mathbf{\hat{z}}_{i}^{t}&\text{if\ }v_{i}\in V^{\prime}\\ \mathbf{\hat{z}}_{i}^{t}&\text{otherwise}\end{cases}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { start_ROW start_CELL italic_α ⋅ bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ( 1 - italic_α ) ⋅ over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL start_CELL if italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW \triangleright Interpolation
15:         Project 𝐳itsuperscriptsubscript𝐳𝑖𝑡\mathbf{z}_{i}^{t}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT onto 𝒫𝒫\mathcal{P}caligraphic_P.
16:     end for
17:end for

A major challenge in embedding trajectories visualization is cross-time alignment, as the DTDG embeddings from different snapshots reside in distinct embedding spaces and are not directly comparable with each other [47]. To address this challenge, we construct a uniform reference frame for the embedding projection of all snapshots using carefully selected anchor nodes Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The anchor nodes are selected from the set of nodes present in V0superscript𝑉0V^{0}italic_V start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT to ensure meaningful cosine similarity computation in each snapshot. 𝐗𝐗\mathbf{X}bold_X, the node embeddings of Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, can be derived from a subset of any temporal embedding 𝐕tsuperscript𝐕𝑡\mathbf{V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT trained on time t𝑡titalic_t (Line 4). DyGETViz is based on the assumption that the embeddings of nodes in Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT do not undergo significant changes over time [48].

We then employ the projection function f()𝑓f(\cdot)italic_f ( ⋅ ) to derive the p𝑝pitalic_p-dimensional representations 𝐙𝐙\mathbf{Z}bold_Z (Line 5). The choice of f()𝑓f(\cdot)italic_f ( ⋅ ) provides flexibility, allowing various projection algorithms that preserve the node-node proximity in the embedding space such as Principal Component Analysis (PCA) [10], t-SNE [11], H-SNE [12], UMAP [49], locally linear embedding (LLE) [50], and Isomap [51] to be employed. This initial projection serves as a steady topological foundation that ensures consistency across all timestamps. As DyGETViz progresses through each timestamp t𝑡titalic_t, it updates the visual representations of each node viVtsubscript𝑣𝑖superscript𝑉𝑡v_{i}\in V^{t}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, considering its new positions (Lines 8-11). To this end, we identify visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s k𝑘kitalic_k nearest anchor nodes vjVsubscript𝑣𝑗superscript𝑉v_{j}\in V^{\prime}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT based on the similarity between the temporal embeddings of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (Lines 10, 12). We then aggregate the visual representations of the neighboring anchor nodes vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to determine the new position for visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Line 13). This method efficiently aligns nodes across different timestamps and allows for the inference of new nodes by aggregating information from anchor nodes. Therefore, DyGETViz can seamlessly incorporate new nodes into the visualization space, such as newly formed COVID-related online communities on social platforms during the COVID-19 pandemic, or the inclusion of new words into a vocabulary in diachronic linguistic analysis. To ensure coherence and smooth transitions between timestamps, the final node projection is obtained by interpolation, combining the aggregated projection 𝐳^itsuperscriptsubscript^𝐳𝑖𝑡\mathbf{\hat{z}}_{i}^{t}over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s static embedding 𝐳isubscript𝐳𝑖\mathbf{z}_{i}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (Line 14).

B.3 Analytics Module.

We employ micro-level and macro-level measures to quantify the structural shifts in both local and global topology.

Measuring Micro-level Changes.

To quantify the micro-level changes in the local topology of each node, we employ two similarity measures: Jaccard index (JaccardnsubscriptJaccard𝑛\operatorname{Jaccard}_{n}roman_Jaccard start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT[16] and Rank-biased Overlap (RBO) [17, 18]. The Jaccard index quantifies the agreement between the closest n𝑛nitalic_n nodes of a given node i𝑖iitalic_i at time (t1)𝑡1(t-1)( italic_t - 1 ) and those at time t𝑡titalic_t in the embedding space. It is calculated as the intersection size between two sets divided by the size of their union.

Jaccardn(i,t)=𝒩(i,n,t1)𝒩(i,n,t)𝒩(i,n,t1)𝒩(i,n,t),subscriptJaccard𝑛𝑖𝑡𝒩𝑖𝑛𝑡1𝒩𝑖𝑛𝑡𝒩𝑖𝑛𝑡1𝒩𝑖𝑛𝑡\operatorname{Jaccard}_{n}(i,t)=\frac{\mathcal{N}(i,n,t-1)\cap\mathcal{N}(i,n,% t)}{\mathcal{N}(i,n,t-1)\cup\mathcal{N}(i,n,t)},roman_Jaccard start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i , italic_t ) = divide start_ARG caligraphic_N ( italic_i , italic_n , italic_t - 1 ) ∩ caligraphic_N ( italic_i , italic_n , italic_t ) end_ARG start_ARG caligraphic_N ( italic_i , italic_n , italic_t - 1 ) ∪ caligraphic_N ( italic_i , italic_n , italic_t ) end_ARG , (2)

where 𝒩(i,n,t)𝒩𝑖𝑛𝑡\mathcal{N}(i,n,t)caligraphic_N ( italic_i , italic_n , italic_t ) indicates the closest n𝑛nitalic_n nodes of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT sorted in ascending order based on their distance from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time t𝑡titalic_t. The resulting JaccardnsubscriptJaccard𝑛\mathrm{Jaccard}_{n}roman_Jaccard start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ranges from 0 to 1 and is agnostic to the ordering of the top-n𝑛nitalic_n nodes. A JaccardnsubscriptJaccard𝑛\mathrm{Jaccard}_{n}roman_Jaccard start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT close to 1 during the period [t1,t]𝑡1𝑡[t-1,t][ italic_t - 1 , italic_t ] indicates minimal changes in the node’s local topology in the embedding space.

As a complementary measure, Ranked Bias Overlap (RBO) considers the absolute ranking of nodes. RBO gradually incorporates lower-ranked nodes while also accounting for the top-ranked ones.

RBO(i,m,t)=(1p)d=1mpd1|𝒩(i,m,t1)𝒩(i,m,t)|d,RBO𝑖𝑚𝑡1𝑝superscriptsubscript𝑑1𝑚superscript𝑝𝑑1𝒩𝑖𝑚𝑡1𝒩𝑖𝑚𝑡𝑑\operatorname{RBO}(i,m,t)=(1-p)\sum_{d=1}^{m}p^{d-1}\frac{|\mathcal{N}(i,m,t-1% )\cap\mathcal{N}(i,m,t)|}{d},roman_RBO ( italic_i , italic_m , italic_t ) = ( 1 - italic_p ) ∑ start_POSTSUBSCRIPT italic_d = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT divide start_ARG | caligraphic_N ( italic_i , italic_m , italic_t - 1 ) ∩ caligraphic_N ( italic_i , italic_m , italic_t ) | end_ARG start_ARG italic_d end_ARG , (3)

where m𝑚mitalic_m represents the maximum depth of the ranked list considered, and p[0,1]𝑝01p\in[0,1]italic_p ∈ [ 0 , 1 ] is the dam** factor that determines the weight assigned to the top of the list. A higher value of p𝑝pitalic_p (closer to 1) assigns more significance to the top of the list. In our experiments, we set p𝑝pitalic_p to 0.9. The RBO metric ranges from 0 to 1, with a higher value indicating greater similarity in the node ordering between the two lists. Intuitively, if a node’s RBO is close to 1 during the period (t1)𝑡1(t-1)( italic_t - 1 ) to t𝑡titalic_t, the node’s global topology in the input DG has undergone minimal changes.

It is worth noting that alternative ranking evaluation measures, such as Spearman’s rank correlation coefficient [52] and Kendall’s tau [53, 54], exist. However, these measures do not explicitly differentiate the importance of the ranks at different positions in the list and are sensitive to small perturbations of rankings, particularly towards the middle of the list [55]. To demonstrate this, Supp. Table S4 shows the distribution of average cosine similarity for all nodes in the four datasets, HistWords-CN, Reddit, Ant, and DGraph. We observe that the cosine similarity usually plateaus in the middle range, suggesting a large number of nodes with highly similar cosine similarity.

Consequently, they cannot accurately reflect the extent to which the local neighbors of a node have changed. Moreover, these are also mainly focused on conjoint rankings [17] where both lists consist of the same set of items, making them less suitable for scenarios where the set of nodes in adjacent snapshots are different due to new nodes constantly being added for comparison. In contrast, RBO and Jaccard index are more responsive to changes in the top portion of two ranked lists and can be applied to indefinite ranking scenarios, which aligns well with our objectives, as we emphasize the importance of top-n𝑛nitalic_n nodes for assessing changes of the local neighbors of each node in the visualization.

Measuring Macro-level Changes.

To assess the changes in global topology, we introduce a novel metric called Normalized Average Rank Change (NARC), which builds upon the Average Rank Change (ARC) metric:

ARC(i,t)=1Ntj=1Nt|rijtrijt1|,ARC𝑖𝑡1superscript𝑁𝑡superscriptsubscript𝑗1superscript𝑁𝑡superscriptsubscript𝑟𝑖𝑗𝑡superscriptsubscript𝑟𝑖𝑗𝑡1\displaystyle\operatorname{ARC}(i,t)=\frac{1}{N^{t}}\sum_{j=1}^{N^{t}}|r_{ij}^% {t}-r_{ij}^{t-1}|,roman_ARC ( italic_i , italic_t ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT | italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT | , (4)
NARC=1Tt=1T1Nt1i=1NtARC(i,t),NARC1𝑇superscriptsubscript𝑡1𝑇1superscript𝑁𝑡1superscriptsubscript𝑖1superscript𝑁𝑡ARC𝑖𝑡\displaystyle\operatorname{NARC}=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{N^{t}-1}% \sum_{i=1}^{N^{t}}\operatorname{ARC}(i,t),roman_NARC = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_ARC ( italic_i , italic_t ) , (5)

where Nt=|VtVt1|superscript𝑁𝑡superscript𝑉𝑡superscript𝑉𝑡1N^{t}=|V^{t}\cap V^{t-1}|italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = | italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∩ italic_V start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT | represents the number of nodes jointly present in both time (t1)𝑡1(t-1)( italic_t - 1 ) and t𝑡titalic_t. ARC(i,t)ARC𝑖𝑡\operatorname{ARC}(i,t)roman_ARC ( italic_i , italic_t ) measures the changes of a node i𝑖iitalic_i’s nearest neighbors in the period [t1,t]𝑡1𝑡[t-1,t][ italic_t - 1 , italic_t ], where a greater ARC(i,t)ARC𝑖𝑡\operatorname{ARC}(i,t)roman_ARC ( italic_i , italic_t ) indicates a larger change in i𝑖iitalic_i’s topology. The NARC metric is an aggregated metric across all nodes and timestamps. By normalizing each ARC(i,t)ARC𝑖𝑡\operatorname{ARC}(i,t)roman_ARC ( italic_i , italic_t ) by a factor of Nt1superscript𝑁𝑡1N^{t}-1italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - 1, we make the NARC metric comparable across datasets with different sizes. The NARC metric provides a comprehensive assessment of the changes in the global topology across all nodes and timestamps, offering valuable insights into the dynamic nature of the evolving network.

To measure the absolute movements of node embeddings in the embedding space over time, we use the L1 and L2 distances between the embeddings of each node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in adjacent timestamps:

LpsubscriptL𝑝\displaystyle\operatorname{L}_{p}roman_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT =1T11Ntt=1T1i=1Nt𝐡it𝐡it1p,p[1,2],formulae-sequenceabsent1𝑇11superscript𝑁𝑡superscriptsubscript𝑡1𝑇1superscriptsubscript𝑖1superscript𝑁𝑡subscriptnormsuperscriptsubscript𝐡𝑖𝑡superscriptsubscript𝐡𝑖𝑡1𝑝𝑝12\displaystyle=\frac{1}{T-1}\frac{1}{N^{t}}\sum_{t=1}^{T-1}\sum_{i=1}^{N^{t}}\|% \mathbf{h}_{i}^{t}-\mathbf{h}_{i}^{t-1}\|_{p},\quad p\in[1,2],= divide start_ARG 1 end_ARG start_ARG italic_T - 1 end_ARG divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥ bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_p ∈ [ 1 , 2 ] , (6)

where 𝐡itsuperscriptsubscript𝐡𝑖𝑡\mathbf{h}_{i}^{t}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is an embedding at time t𝑡titalic_t. Here, we consider 𝐡itsuperscriptsubscript𝐡𝑖𝑡\mathbf{h}_{i}^{t}bold_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT being one of 𝐯it,𝐯~it1superscriptsubscript𝐯𝑖𝑡superscriptsubscript~𝐯𝑖𝑡1\mathbf{v}_{i}^{t},\mathbf{\tilde{v}}_{i}^{t-1}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT, and 𝐳itsuperscriptsubscript𝐳𝑖𝑡\mathbf{z}_{i}^{t}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, where 𝐯itsuperscriptsubscript𝐯𝑖𝑡\mathbf{v}_{i}^{t}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the original embedding, 𝐯~it1=𝐯it/𝐯itsuperscriptsubscript~𝐯𝑖𝑡1superscriptsubscript𝐯𝑖𝑡normsuperscriptsubscript𝐯𝑖𝑡\mathbf{\tilde{v}}_{i}^{t-1}=\mathbf{v}_{i}^{t}/\|\mathbf{v}_{i}^{t}\|over~ start_ARG bold_v end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT = bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT / ∥ bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥ is the normalized embedding, and 𝐳itsuperscriptsubscript𝐳𝑖𝑡\mathbf{z}_{i}^{t}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the projected embeddings.

Finally, we extend the RBO metric to a macro-level version, which is called macro-level RBO, as follows:

RBOmacro(i,m,t)=1Tt=1T1Nti=1NtRBO(i,m,t).subscriptRBOmacro𝑖𝑚𝑡1𝑇superscriptsubscript𝑡1𝑇1superscript𝑁𝑡superscriptsubscript𝑖1superscript𝑁𝑡RBO𝑖𝑚𝑡\operatorname{RBO}_{\text{macro}}(i,m,t)=\frac{1}{T}\sum_{t=1}^{T}\frac{1}{N^{% t}}\sum_{i=1}^{N^{t}}\operatorname{RBO}(i,m,t).roman_RBO start_POSTSUBSCRIPT macro end_POSTSUBSCRIPT ( italic_i , italic_m , italic_t ) = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_RBO ( italic_i , italic_m , italic_t ) . (7)

C Related Works

C.1 Graph Neural Network

Graph neural networks [56] have emerged as a powerful framework for modeling complex relationships in graph-structured data. In particular, dynamic graph models, which capture temporal dynamics in evolving systems, have been successfully applied in analyzing various domains such as communication networks [15], transaction networks [57], social networks [58, 1, 59, 60], disease control [61], and international trade [3].

C.2 Visualization

Visualization is a popular approach for model analytics due to its user-friendly and intuitive nature, which allows researchers and analysts to easily comprehend complex temporal relationships. Techniques such as Principal Component Analysis (PCA) [10], t-Distributed Stochastic Neighbor Embedding (t-SNE) [11], Multidimensional Scaling (MDS) [62], and Uniform Manifold Approximation and Projection (UMAP) [49] have been widely used to represent high-dimensional data in a lower-dimensional space by preserve the structural relationships of the original data. Despite these advancements, there is still a need for visualization techniques that can effectively capture and represent the dynamics of evolving graph data over an extended period of time. Although researchers have explored visualization techniques for graphs, existing works usually focus on static graphs [63, 64] or consecutive graph snapshots [65], limiting their ability to showcase the trajectory of node embeddings over time [65]. This limitation hinders the comprehensive understanding of how nodes evolve and interact within the graph structure.

Refer to caption
Figure S2: Comparison of varying f()𝑓f(\cdot)italic_f ( ⋅ ) (Line 6) and k𝑘kitalic_k (Line 12) in Algorithm S1.
Table S2: Notations used in this paper
Notation Description
G𝐺Gitalic_G A static graph
Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT A graph snapshot at timestamp t𝑡titalic_t
V,E𝑉𝐸V,Eitalic_V , italic_E Sets of nodes and edges
Vt,Etsuperscript𝑉𝑡superscript𝐸𝑡V^{t},E^{t}italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Sets of nodes and edges in each snapshot Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
Vsuperscript𝑉V^{\prime}italic_V start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT Set of anchor nodes
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT A node i𝑖iitalic_i
𝒩(i,k,t)𝒩𝑖𝑘𝑡\mathcal{N}(i,k,t)caligraphic_N ( italic_i , italic_k , italic_t ) visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s list of k𝑘kitalic_k nearest neighbors in 𝐕tsuperscript𝐕𝑡\mathbf{V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, sorted in descending order
𝐯itsuperscriptsubscript𝐯𝑖𝑡\mathbf{v}_{i}^{t}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Temporal node embedding of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at timestamp t𝑡titalic_t
𝐕tsuperscript𝐕𝑡\mathbf{V}^{t}bold_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Temporal node embeddings for Vtsuperscript𝑉𝑡V^{t}italic_V start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at timestamp t𝑡titalic_t
𝐗𝐗\mathbf{X}bold_X Anchor embeddings
\mathcal{L}caligraphic_L Training Objective
α,λ1,λ2𝛼subscript𝜆1subscript𝜆2\alpha,\lambda_{1},\lambda_{2}italic_α , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Hyperparameters
Refer to caption
Figure S3: Jaccard100subscriptJaccard100\operatorname{Jaccard}_{100}roman_Jaccard start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT among the subreddits in terms of videos shared between them.
Refer to caption
Figure S4: Distribution of average cosine similarity of the four datasets: HistWords-CN, Reddit, Ant, and DGraph. The x- and y- axis represents the percentage of the nodes in the dataset and cosine similarities, respectively.
Table S3: Evolution of word associations from the 1950s to the 1990s. The term “gay” initially carried connotations with happiness and fortune, but underwent a decline in positivity during the 1970s as its association with homosexuality became more prevalent.
Word 1950 1960 1970 1980 1990
happy happier, fortunate, glad, lucky, delighted happier, pleasant, lucky, loved, delighted glad, happier, fortunate, longed, delighted glad, delighted, happier, pleasant, fortunate glad, happier, delighted, eager, lucky
delighted glad, surprised, astonished, pleased, gratified gratified, surprised, astonished, glad, amused glad, surprised, astonished, gratified, pleased glad, surprised, happy, amused, astonished surprised, glad, pleased, happy, astonished
gay charming, lovely, beautiful, elegant, bright elegant, charming, cheerful, lovely, witty charming, cheerful, clubs, forlorn, ugly boys, clubs, lovers, charming, men men, victims, violence lesbian, bisexual
homosexual sex, intimacy, prostitution, males, females sex, cruelties, immoral, notorious, scandalous sex, unmarried, gender, adultery, immoral women, unmarried, gay, immoral, sex gay, males, immoral, illegal, sex
lesbian \ \ vehemently, clubs, gang, gay, dance feminist, gay, sexuality, identities, women gay, women, female, victims, violence
Table S4: Average Jaccard100subscriptJaccard100\operatorname{Jaccard}_{100}roman_Jaccard start_POSTSUBSCRIPT 100 end_POSTSUBSCRIPT of subreddits in each year.
Subreddit 2018 2019 2020 2021 2022
videos 0.118 0.211 0.206 0.258 0.256
YouTubeSubscribeBoost 0.244 0.389 0.296 0.295 0.316
YouTube_startups 0.233 0.339 0.289 0.341 0.312
AdvertiseYourVideos 0.287 0.349 0.292 0.313 0.332
SmallYoutubers 0.267 0.356 0.304 0.306 0.354
GetMoreViewsYT 0.284 0.311 0.269 0.280 0.305
gaming 0.263 0.280 0.285 0.272 0.249
videogames 0.303 0.275 0.324 0.322 0.292
pcgaming 0.258 0.236 0.277 0.266 0.223
YouTubeGamers 0.279 0.364 0.291 0.343 0.397
PromoteGamingVideos 0.317 0.423 0.336 0.383 0.407
gamingvids 0.318 0.329 0.295 0.296 0.300
GlobalOffensive 0.071 0.111 0.162 0.093 0.080
apexlegends N/A 0.114 0.110 0.099 0.073
leagueoflegends 0.069 0.044 0.057 0.062 0.038
Minecraft 0.040 0.115 0.136 0.143 0.072
Subreddit 2018 2019 2020 2021 2022
kpop 0.066 0.082 0.129 0.164 0.151
popheads 0.170 0.160 0.181 0.221 0.195
indieheads 0.221 0.258 0.263 0.278 0.274
Music 0.187 0.224 0.291 0.339 0.331
hiphopheads 0.232 0.296 0.308 0.333 0.276
listentothis 0.234 0.325 0.359 0.374 0.338
hiphop 0.266 0.363 0.424 0.442 0.312
rap 0.282 0.413 0.421 0.417 0.296
sports 0.090 0.111 0.101 0.101 0.084
nba 0.121 0.109 0.089 0.082 0.076
nfl 0.077 0.112 0.061 0.049 0.050
MMA 0.103 0.086 0.090 0.088 0.070
SquaredCircle 0.065 0.051 0.058 0.085 0.086
politics 0.384 0.378 0.423 0.313 0.290
The_Donald 0.331 0.375 0.188 N/A N/A
WayOfTheBern 0.385 0.405 0.447 0.441 0.308

D Additional Experimental Results

Refer to caption
Figure S5: Trajectories of ants from the Ant dataset [9].

D.1 Enron: Email Communication

Refer to caption
Figure S6: Visualization of the Enron dataset [44]. (a) RBO and Jaccard3subscriptJaccard3\operatorname{Jaccard}_{3}roman_Jaccard start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT during the projection time period. (b) Dynamic visualization of the node trajectories. The positions of employees are annotated. Employees occupying managerial/president positions (Mike Grigsby, Louise Kitchen and Greg Whalley) other than the CEOs interact with employees at various levels, resulting in more diverse trajectories compared to CEOs (Kenneth Lay and Jeffrey Skilling) and regular employees (e.g.Scott Neal). Furthermore, we observe a decline in both RBO and Jaccard3subscriptJaccard3\operatorname{Jaccard}_{3}roman_Jaccard start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT from January to March 2001, suggesting a significant shift in communication partners among all employees. This decline aligns with the CEO transition in February 2001.

Enron Corporation, founded by Kenneth Lay in 1985, was a prominent energy company until its notorious collapse due to an institutionalized and systematic accounting fraud [44, 66]. In February 2001, Jeffrey Skilling became Enron’s CEO, initiating a period characterized by aggressive and intricate accounting practices [67]. Jeffrey resigned in December 2001, shortly before Enron’s downfall. Studies show that Enron’s collapse can be attributed to a failure of responsible communication [67]. Lay and Skilling were only partially aware of the financial misconduct of their subordinates. In this study, we investigate the email communication network from June 1999 to December 2001 to shed light on the internal communication patterns that contribute to Enron’s failure. Our visualization provides valuable insights into these communication patterns, particularly highlighting the trajectories of CEOs like Kenneth Lay and Jeffrey Skilling.

In Fig. S6b, the trajectories of Lay and Skilling indicate relatively static communication communities, confined within a small range that mainly involves a limited number of vice presidents and managers. These patterns reflect their shortcomings in two-way communication, as demonstrated in previous studies [66, 67] — the failure to deliver honest, ethical messages to employees and the lack of awareness regarding company operations. Meanwhile, ordinary employees such as Geir Solberg, Kay Mann, and Scott Neal have specific job functions that confine their communication to their respective teams, resulting in relatively fixed communication patterns with a limited number of partners and trajectories with less variability.

In contrast, managers and presidents play crucial roles in facilitating communication between upper management and ordinary employees [68, 69, 70], resulting in more diverse interactions with individuals at different levels within the organization. Previous analyses identified the top three influential nodes in the Enron dataset as Louis Kitchen (President), Mike Grigsby (Manager), and Greg Whalley (President) according to graph entropy [66]. Their trajectories in the visualization exhibit broader and more diverse patterns, involving different individuals in different positions over different time periods. This reflects their extensive responsibilities and significant roles in managing the organization.

Moreover, as shown in Fig. S6a, the decline in RBO and Jaccard3subscriptJaccard3\operatorname{Jaccard}_{3}roman_Jaccard start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT across all employees during the CEO transition from Lay to Skilling (January - March 2001) highlights the impact of leadership changes on communication dynamics within the organization. DyGETViz uncovers differences in communication patterns among employees, offering a novel perspective on organizational structure and dynamics. Understanding these patterns provides valuable insights for future research and organizational management.

D.2 Dynamic Temporal and Spatial Modeling of Chickenpox Spread in Hungary

Refer to caption
Figure S7: a. Trajectories of the 19 counties and the capital Budapest in the Chickenpox dataset [45]. The most (least) populous counties are plotted using colors to the red (purple) side of the spectrum, and vice versa. b. Embedding movements of the 20 nodes. c/d. Population and average weekly chickenpox cases of all counties.

In epidemiology forecasting, dynamic graph models can potentially enhance our understanding and prediction of disease spread. By incorporating temporal and spatial dynamics, these models can capture the intricate interplay between population density, geography, and mobility patterns, all of which play critical roles in disease transmission. However, the successful application of DTDG models in epidemiology forecasting relies not only on the accuracy and robustness of the models, but also on our ability to interpret and understand their mechanisms of operation. In this regard, the use of visualization techniques becomes crucial.

In this study, we use the Hungary Chickenpox dataset [45], which includes the weekly chickenpox cases in Hungarian counties and the capital Budapest between 2005 and 2015. From the trajectories in Fig. S7a, we found that the capital city Budapest stands out as the node with the most movements due to its high population and . As the second most populous county in the country, Pest exhibits a trajectory that significantly overlaps with Budapest at each snapshot, indicating a high seasonality in the number of cases. This overlap can be attributed to factors such as geographical locations and suburbanization in the metropolitan area of Budapest, which caused considerable population movements between the two regions. According to a census in 2011 [71], nearly 60% of commuters living in the suburban zone of Pest work in Budapest. Such population overlap can facilitate the spread of diseases. Bács-Kiskun, a bordering state of Pest, moves towards Pest in winter, especially the middle of December, but away from it in summer, indicating a periodicity between the winter surge and the summer decline [45, 72]. In contrast, counties such as Tolna, Vas, Zala, and Heves, which are among the five least populated counties, form their own clusters with movements limited to the lower left of the plot, indicating a lower susceptibility to diseases due to smaller populations and fewer demographic movements.