Search | arXiv e-print repository

Transformers represent belief state geometry in their residual stream

Authors: Adam S. Shai, Sarah E. Marzen, Lucas Teixeira, Alexander Gietelink Oldenziel, Paul M. Riechers

Abstract: What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stre… ▽ More What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a framework connecting the structure of training data to the computational structure and representations that transformers use to carry out their behavior. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2208.02162 [pdf, other]

doi 10.1109/ICMLA52953.2021.00152

One Node at a Time: Node-Level Network Classification

Authors: Saray Shai, Isaac Jacobs, Peter J. Mucha

Abstract: Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network dat… ▽ More Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network datasets and random network models, that a classifier can be trained to accurately predict the network category of a given node (without seeing the whole network), implying that complex networks display distinct structural patterns even at the node level. Finally, we discuss two applications of node-level network classification: (i) whole-network classification from small samples of nodes, and (ii) network bootstrap**. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: 8 pages, 5 figures

Journal ref: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

arXiv:2012.04720 [pdf]

A guide to choosing and implementing reference models for social network analysis

Authors: Elizabeth A. Hobson, Matthew J. Silk, Nina H. Fefferman, Daniel B. Larremore, Puck Rombach, Saray Shai, Noa Pinter-Wollman

Abstract: Analyzing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as develo** system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation… ▽ More Analyzing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as develo** system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation for hypothesis-testing when analyzing network data. We outline the key stages in producing an effective reference model and detail four approaches for generating reference distributions: permutation, resampling, sampling from a distribution, and generative models. We highlight when each type of approach would be appropriate and note potential pitfalls for researchers to avoid. Throughout, we illustrate our points with examples from a simulated social system. Our aim is to provide social network researchers with a deeper understanding of analytical approaches to enhance their confidence when tailoring reference models to specific research questions. △ Less

Submitted 4 March, 2021; v1 submitted 31 August, 2020; originally announced December 2020.

arXiv:1705.02305 [pdf, other]

Case studies in network community detection

Authors: Saray Shai, Natalie Stanley, Clara Granell, Dane Taylor, Peter J. Mucha

Abstract: Community structure describes the organization of a network into subgraphs that contain a prevalence of edges within each subgraph and relatively few edges across boundaries between subgraphs. The development of community-detection methods has occurred across disciplines, with numerous and varied algorithms proposed to find communities. As we present in this Chapter via several case studies, commu… ▽ More Community structure describes the organization of a network into subgraphs that contain a prevalence of edges within each subgraph and relatively few edges across boundaries between subgraphs. The development of community-detection methods has occurred across disciplines, with numerous and varied algorithms proposed to find communities. As we present in this Chapter via several case studies, community detection is not just an "end game" unto itself, but rather a step in the analysis of network data which is then useful for furthering research in the disciplinary domain of interest. These case-study examples arise from diverse applications, ranging from social and political science to neuroscience and genetics, and we have chosen them to demonstrate key aspects of community detection and to highlight that community detection, in practice, should be directed by the application at hand. △ Less

Submitted 5 May, 2017; originally announced May 2017.

Comments: 21 pages, 5 figures

arXiv:1511.05271 [pdf, other]

doi 10.1103/PhysRevLett.116.228301

Enhanced detectability of community structure in multilayer networks through layer aggregation

Authors: Dane Taylor, Saray Shai, Natalie Stanley, Peter J. Mucha

Abstract: Many systems are naturally represented by a multilayer network in which edges exist in multiple layers that encode different, but potentially related, types of interactions, and it is important to understand limitations on the detectability of community structure in these networks. Using random matrix theory, we analyze detectability limitations for multilayer (specifically, multiplex) stochastic… ▽ More Many systems are naturally represented by a multilayer network in which edges exist in multiple layers that encode different, but potentially related, types of interactions, and it is important to understand limitations on the detectability of community structure in these networks. Using random matrix theory, we analyze detectability limitations for multilayer (specifically, multiplex) stochastic block models (SBMs) in which L layers are derived from a common SBM. We study the effect of layer aggregation on detectability for several aggregation methods, including summation of the layers' adjacency matrices for which we show the detectability limit vanishes as O(L^{-1/2}) with increasing number of layers, L. Importantly, we find a similar scaling behavior when the summation is thresholded at an optimal value, providing insight into the common - but not well understood - practice of thresholding pairwise-interaction data to obtain sparse network representations. △ Less

Submitted 4 May, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. Lett. 116, 228301 (2016)

arXiv:1508.07265 [pdf, other]

Multiplex networks in metropolitan areas: generic features and local effects

Authors: Emanuele Strano, Saray Shai, Simon Dobson, Marc Barthelemy

Abstract: Most large cities are spanned by more than one transportation system. These different modes of transport have usually been studied separately: it is however important to understand the impact on urban systems of the coupling between them and we report in this paper an empirical analysis of the coupling between the street network and the subway for the two large metropolitan areas of London and New… ▽ More Most large cities are spanned by more than one transportation system. These different modes of transport have usually been studied separately: it is however important to understand the impact on urban systems of the coupling between them and we report in this paper an empirical analysis of the coupling between the street network and the subway for the two large metropolitan areas of London and New York. We observe a similar behaviour for network quantities related to quickest paths suggesting the existence of generic mechanisms operating beyond the local peculiarities of the specific cities studied. An analysis of the betweenness centrality distribution shows that the introduction of underground networks operate as a decentralising force creating congestions in places located at the end of underground lines. Also, we find that increasing the speed of subways is not always beneficial and may lead to unwanted uneven spatial distributions of accessibility. In fact, for London -- but not for New York -- there is an optimal subway speed in terms of global congestion. These results show that it is crucial to consider the full, multimodal, multi-layer network aspects of transportation systems in order to understand the behaviour of cities and to avoid possible negative side-effects of urban planning decisions. △ Less

Submitted 29 September, 2015; v1 submitted 28 August, 2015; originally announced August 2015.

Comments: 12 pages, 8 figures. Final version with an additional discussion on the total congestion

Journal ref: Journal Royal Society Interface 12:20150651 (2015)

arXiv:1508.01352 [pdf, other]

doi 10.1088/1367-2630/17/12/123007

Resilience of Networks Formed of Interdependent Modular Networks

Authors: Louis Shekhtman, Saray Shai, Shlomo Havlin

Abstract: Many infrastructure networks have a modular structure and are also interdependent. While significant research has explored the resilience of interdependent networks, there has been no analysis of the effects of modularity. Here we develop a theoretical framework for attacks on interdependent modular networks and support our results by simulations. We focus on the case where each network has the sa… ▽ More Many infrastructure networks have a modular structure and are also interdependent. While significant research has explored the resilience of interdependent networks, there has been no analysis of the effects of modularity. Here we develop a theoretical framework for attacks on interdependent modular networks and support our results by simulations. We focus on the case where each network has the same number of communities and the dependency links are restricted to be between pairs of communities of different networks. This is very realistic for infrastructure across cities. Each city has its own infrastructures and different infrastructures are dependent within the city. However, each infrastructure is connected within and between cities. For example, a power grid will connect many cities as will a communication network, yet a power station and communication tower that are interdependent will likely be in the same city. It has been shown that single networks are very susceptible to the failure of the interconnected nodes (between communities) Shai et al. and that attacks on these nodes are more crippling than attacks based on betweenness da Cunha et al. In our example of cities these nodes have long range links which are more likely to fail. For both treelike and looplike interdependent modular networks we find distinct regimes depending on the number of modules, $m$. (i) In the case where there are fewer modules with strong intraconnections, the system first separates into modules in an abrupt first-order transition and then each module undergoes a second percolation transition. (ii) When there are more modules with many interconnections between them, the system undergoes a single transition. Overall, we find that modular structure can influence the type of transitions observed in interdependent networks and should be considered in attempts to make interdependent networks more resilient. △ Less

Submitted 6 August, 2015; originally announced August 2015.

arXiv:1507.01826 [pdf, other]

doi 10.1109/TNSE.2016.2537545

Clustering Network Layers With the Strata Multilayer Stochastic Block Model

Authors: Natalie Stanley, Saray Shai, Dane Taylor, Peter J. Mucha

Abstract: Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisel… ▽ More Multilayer networks are a useful data structure for simultaneously capturing multiple types of relationships between a set of nodes. In such networks, each relational definition gives rise to a layer. While each layer provides its own set of information, community structure across layers can be collectively utilized to discover and quantify underlying relational patterns between nodes. To concisely extract information from a multilayer network, we propose to identify and combine sets of layers with meaningful similarities in community structure. In this paper, we describe the "strata multilayer stochastic block model'' (sMLSBM), a probabilistic model for multilayer community structure. The central extension of the model is that there exist groups of layers, called "strata'', which are defined such that all layers in a given stratum have community structure described by a common stochastic block model (SBM). That is, layers in a stratum exhibit similar node-to-community assignments and SBM probability parameters. Fitting the sMLSBM to a multilayer network provides a joint clustering that yields node-to-community and layer-to-stratum assignments, which cooperatively aid one another during inference. We describe an algorithm for separating layers into their appropriate strata and an inference technique for estimating the SBM parameters for each stratum. We demonstrate our method using synthetic networks and a multilayer network inferred from data collected in the Human Microbiome Project. △ Less

Submitted 9 October, 2015; v1 submitted 7 July, 2015; originally announced July 2015.

arXiv:1404.4748 [pdf, other]

Resilience of modular complex networks

Authors: Saray Shai, Dror Y. Kenett, Yoed N. Kenett, Miriam Faust, Simon Dobson, Shlomo Havlin

Abstract: Complex networks often have a modular structure, where a number of tightly- connected groups of nodes (modules) have relatively few interconnections. Modularity had been shown to have an important effect on the evolution and stability of biological networks, on the scalability and efficiency of large-scale infrastructure, and the development of economic and social systems. An analytical framework… ▽ More Complex networks often have a modular structure, where a number of tightly- connected groups of nodes (modules) have relatively few interconnections. Modularity had been shown to have an important effect on the evolution and stability of biological networks, on the scalability and efficiency of large-scale infrastructure, and the development of economic and social systems. An analytical framework for understanding modularity and its effects on network vulnerability is still missing. Through recent advances in the understanding of multilayer networks, however, it is now possible to develop a theoretical framework to systematically study this critical issue. Here we study, analytically and numerically, the resilience of modular networks under attacks on interconnected nodes, which exhibit high betweenness values and are often more exposed to failure. Our model provides new understandings into the feedback between structure and function in real world systems, and consequently has important implications as diverse as develo** efficient immunization strategies, designing robust large-scale infrastructure, and understanding brain function. △ Less

Submitted 18 April, 2014; originally announced April 2014.

Comments: 15 pages, 4 figures

Showing 1–9 of 9 results for author: Shai, S