-
Transformers represent belief state geometry in their residual stream
Authors:
Adam S. Shai,
Sarah E. Marzen,
Lucas Teixeira,
Alexander Gietelink Oldenziel,
Paul M. Riechers
Abstract:
What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stre…
▽ More
What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a framework connecting the structure of training data to the computational structure and representations that transformers use to carry out their behavior.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
One Node at a Time: Node-Level Network Classification
Authors:
Saray Shai,
Isaac Jacobs,
Peter J. Mucha
Abstract:
Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network dat…
▽ More
Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network datasets and random network models, that a classifier can be trained to accurately predict the network category of a given node (without seeing the whole network), implying that complex networks display distinct structural patterns even at the node level. Finally, we discuss two applications of node-level network classification: (i) whole-network classification from small samples of nodes, and (ii) network bootstrap**.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
A guide to choosing and implementing reference models for social network analysis
Authors:
Elizabeth A. Hobson,
Matthew J. Silk,
Nina H. Fefferman,
Daniel B. Larremore,
Puck Rombach,
Saray Shai,
Noa Pinter-Wollman
Abstract:
Analyzing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as develo** system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation…
▽ More
Analyzing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as develo** system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation for hypothesis-testing when analyzing network data. We outline the key stages in producing an effective reference model and detail four approaches for generating reference distributions: permutation, resampling, sampling from a distribution, and generative models. We highlight when each type of approach would be appropriate and note potential pitfalls for researchers to avoid. Throughout, we illustrate our points with examples from a simulated social system. Our aim is to provide social network researchers with a deeper understanding of analytical approaches to enhance their confidence when tailoring reference models to specific research questions.
△ Less
Submitted 4 March, 2021; v1 submitted 31 August, 2020;
originally announced December 2020.
-
Case studies in network community detection
Authors:
Saray Shai,
Natalie Stanley,
Clara Granell,
Dane Taylor,
Peter J. Mucha
Abstract:
Community structure describes the organization of a network into subgraphs that contain a prevalence of edges within each subgraph and relatively few edges across boundaries between subgraphs. The development of community-detection methods has occurred across disciplines, with numerous and varied algorithms proposed to find communities. As we present in this Chapter via several case studies, commu…
▽ More
Community structure describes the organization of a network into subgraphs that contain a prevalence of edges within each subgraph and relatively few edges across boundaries between subgraphs. The development of community-detection methods has occurred across disciplines, with numerous and varied algorithms proposed to find communities. As we present in this Chapter via several case studies, community detection is not just an "end game" unto itself, but rather a step in the analysis of network data which is then useful for furthering research in the disciplinary domain of interest. These case-study examples arise from diverse applications, ranging from social and political science to neuroscience and genetics, and we have chosen them to demonstrate key aspects of community detection and to highlight that community detection, in practice, should be directed by the application at hand.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
Enhanced detectability of community structure in multilayer networks through layer aggregation
Authors:
Dane Taylor,
Saray Shai,
Natalie Stanley,
Peter J. Mucha
Abstract:
Many systems are naturally represented by a multilayer network in which edges exist in multiple layers that encode different, but potentially related, types of interactions, and it is important to understand limitations on the detectability of community structure in these networks. Using random matrix theory, we analyze detectability limitations for multilayer (specifically, multiplex) stochastic…
▽ More
Many systems are naturally represented by a multilayer network in which edges exist in multiple layers that encode different, but potentially related, types of interactions, and it is important to understand limitations on the detectability of community structure in these networks. Using random matrix theory, we analyze detectability limitations for multilayer (specifically, multiplex) stochastic block models (SBMs) in which L layers are derived from a common SBM. We study the effect of layer aggregation on detectability for several aggregation methods, including summation of the layers' adjacency matrices for which we show the detectability limit vanishes as O(L^{-1/2}) with increasing number of layers, L. Importantly, we find a similar scaling behavior when the summation is thresholded at an optimal value, providing insight into the common - but not well understood - practice of thresholding pairwise-interaction data to obtain sparse network representations.
△ Less
Submitted 4 May, 2016; v1 submitted 16 November, 2015;
originally announced November 2015.
-
Multiplex networks in metropolitan areas: generic features and local effects
Authors:
Emanuele Strano,
Saray Shai,
Simon Dobson,
Marc Barthelemy
Abstract:
Most large cities are spanned by more than one transportation system. These different modes of transport have usually been studied separately: it is however important to understand the impact on urban systems of the coupling between them and we report in this paper an empirical analysis of the coupling between the street network and the subway for the two large metropolitan areas of London and New…
▽ More
Most large cities are spanned by more than one transportation system. These different modes of transport have usually been studied separately: it is however important to understand the impact on urban systems of the coupling between them and we report in this paper an empirical analysis of the coupling between the street network and the subway for the two large metropolitan areas of London and New York. We observe a similar behaviour for network quantities related to quickest paths suggesting the existence of generic mechanisms operating beyond the local peculiarities of the specific cities studied. An analysis of the betweenness centrality distribution shows that the introduction of underground networks operate as a decentralising force creating congestions in places located at the end of underground lines. Also, we find that increasing the speed of subways is not always beneficial and may lead to unwanted uneven spatial distributions of accessibility. In fact, for London -- but not for New York -- there is an optimal subway speed in terms of global congestion. These results show that it is crucial to consider the full, multimodal, multi-layer network aspects of transportation systems in order to understand the behaviour of cities and to avoid possible negative side-effects of urban planning decisions.
△ Less
Submitted 29 September, 2015; v1 submitted 28 August, 2015;
originally announced August 2015.
-
Resilience of Networks Formed of Interdependent Modular Networks
Authors:
Louis Shekhtman,
Saray Shai,
Shlomo Havlin
Abstract:
Many infrastructure networks have a modular structure and are also interdependent. While significant research has explored the resilience of interdependent networks, there has been no analysis of the effects of modularity. Here we develop a theoretical framework for attacks on interdependent modular networks and support our results by simulations. We focus on the case where each network has the sa…
▽ More
Many infrastructure networks have a modular structure and are also interdependent. While significant research has explored the resilience of interdependent networks, there has been no analysis of the effects of modularity. Here we develop a theoretical framework for attacks on interdependent modular networks and support our results by simulations. We focus on the case where each network has the same number of communities and the dependency links are restricted to be between pairs of communities of different networks. This is very realistic for infrastructure across cities. Each city has its own infrastructures and different infrastructures are dependent within the city. However, each infrastructure is connected within and between cities. For example, a power grid will connect many cities as will a communication network, yet a power station and communication tower that are interdependent will likely be in the same city. It has been shown that single networks are very susceptible to the failure of the interconnected nodes (between communities) Shai et al. and that attacks on these nodes are more crippling than attacks based on betweenness da Cunha et al. In our example of cities these nodes have long range links which are more likely to fail. For both treelike and looplike interdependent modular networks we find distinct regimes depending on the number of modules, $m$. (i) In the case where there are fewer modules with strong intraconnections, the system first separates into modules in an abrupt first-order transition and then each module undergoes a second percolation transition. (ii) When there are more modules with many interconnections between them, the system undergoes a single transition. Overall, we find that modular structure can influence the type of transitions observed in interdependent networks and should be considered in attempts to make interdependent networks more resilient.
△ Less
Submitted 6 August, 2015;
originally announced August 2015.
-
Clustering Network Layers With the Strata Multilayer Stochastic Block Model
Authors:
Natalie Stanley,
Saray Shai,
Dane Taylor,
Peter J. Mucha
Abstract:
Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisel…
▽ More
Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the "strata multilayer stochastic block model'' (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called "strata'', which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project.
△ Less
Submitted 9 October, 2015; v1 submitted 7 July, 2015;
originally announced July 2015.
-
Resilience of modular complex networks
Authors:
Saray Shai,
Dror Y. Kenett,
Yoed N. Kenett,
Miriam Faust,
Simon Dobson,
Shlomo Havlin
Abstract:
Complex networks often have a modular structure, where a number of tightly- connected groups of nodes (modules) have relatively few interconnections. Modularity had been shown to have an important effect on the evolution and stability of biological networks, on the scalability and efficiency of large-scale infrastructure, and the development of economic and social systems. An analytical framework…
▽ More
Complex networks often have a modular structure, where a number of tightly- connected groups of nodes (modules) have relatively few interconnections. Modularity had been shown to have an important effect on the evolution and stability of biological networks, on the scalability and efficiency of large-scale infrastructure, and the development of economic and social systems. An analytical framework for understanding modularity and its effects on network vulnerability is still missing. Through recent advances in the understanding of multilayer networks, however, it is now possible to develop a theoretical framework to systematically study this critical issue. Here we study, analytically and numerically, the resilience of modular networks under attacks on interconnected nodes, which exhibit high betweenness values and are often more exposed to failure. Our model provides new understandings into the feedback between structure and function in real world systems, and consequently has important implications as diverse as develo** efficient immunization strategies, designing robust large-scale infrastructure, and understanding brain function.
△ Less
Submitted 18 April, 2014;
originally announced April 2014.